Backtesting Expected Shortfall - The Case of The Dhaka Stock Exchange

Backtesting Expected Shortfall:
The Case of the Dhaka Stock Exchange
Reza Al Saad
Thesis for the degree of Master of Science (MSc)

at the Christian-Albrechts-Universität zu Kiel, Germany
Kiel
December, 2016
Master's Thesis
for the Master's degree programme
Quantitative Finance
in the Institute for Statistics and Econometrics
at the Christian-Albrechts-Universität zu Kiel
Submitted by
Reza Al Saad
ID: 1000855
Supervised by
First Supervisor: Prof. Dr. Matei Demetrescu
Second Supervisor: Prof. Dr. Markus Haas
Kiel
December, 2016
Reza Al Saad
stu106082@mail.uni-kiel.de
Abstract
The measurement of risk in a more formal way is of increasing demand for an emerg-
ing stock market like the Dhaka Stock Exchange (DSE). Hence, this article is focused
on Backtesting Expected Shortfall (ES). This is a new and contemporary method
of computing the loss quantiles which requires the calculation of both Value-at-Risk
(VaR) and Expected Shortfall (ES). Dierent GARCH models have been used to
forecast the future returns based on the data from 2002 to 2009. The distribution
assumption is also important because of the skewness of the returns. Therefore, Nor-
mal and Student's t-distribution have been used in the GARCH models. Student's
t-distribution brings out a better forecast compared to the Normal distribution. The
three backtesting methods which have been introduced so far, were performed on the
forecasted returns. Some limitations of these methods have been identied. Due to
insucient work on both forecasting future returns and implementing risk measure-
ment methods, this article will be very useful for the future researchers as well as for
the risk managers and the investors.
Contents
1 Introduction 1
2 Risk Measurements 3
2.1 Mathematical Properties . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.3 Subadditivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.4 Positive Homogeneity . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.5 Translation Invariance . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Value-at-Risk (VaR) . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Expected Shortfall (ES) . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Elicitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Parametric Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.2 Student's t-distribution . . . . . . . . . . . . . . . . . . . . . . 7
2.6 BASEL III Accords . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Modelling Risk Measures 9

3.1 Gaussian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Historical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Volatility Specication 11
4.1 ARMA(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Simple ARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Extension to Standard GARCH . . . . . . . . . . . . . . . . . . . . . 12
4.4 Exponential GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5 Integrated GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.6 GJR - GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.7 Component GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Backtesting Methodologies 14
5.1 Wong's Saddlepoint Approach . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Approaches of Acerbi and Szekely . . . . . . . . . . . . . . . . . . . . 16
5.2.1 First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2.2 Second Approach . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2.3 Third Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Costanzino and Curran's Approaches . . . . . . . . . . . . . . . . . . 17
5.3.1 Spectral Risk Measure . . . . . . . . . . . . . . . . . . . . . . 17
5.3.2 Dening Test Statistic . . . . . . . . . . . . . . . . . . . . . . 18
6 Empirical Results 19
6.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Analysis of Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 VaR Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.5 Expected Shortfall: Backtesting Results . . . . . . . . . . . . . . . . 25
7 Conclusion 31
List of Figures
1 VaR at 1% signicance level when returns are Normally distributed. . 1
2 Indices of DSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Log Returns of the DSE . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Test for Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . 21
5 QQ Plots of DSE Log Returns and Standardized Log Returns . . . . 22
6 Value-at-Risk for dierent ARMA(1,1)-GARCH(1,1) Models. . . . . . 24
List of Tables
1 Descriptive Statistics of Log Returns . . . . . . . . . . . . . . . . . . 19
2 Goodness-of-Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Box-Ljung Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Parameter Estimates of ARMA(1,1)-GARCH(1,1) models. . . . . . . 23
5 Summary of VaR Calculations . . . . . . . . . . . . . . . . . . . . . . 25
6 P-values for Dierent Methods under AR(1)-GARCH(1,1) with Normal
Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 P-values for Dierent Methods under AR(1)-GARCH(1,1) with Stu-
dent's t-Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8 P-values for Dierent Methods under ARMA(1,1)-GARCH(1,1) with
Normal Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9 P-values for Dierent Methods under ARMA(1,1)-GARCH(1,1) with
Student's t-Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . 30
iii
Bibliography
Appendices
Appendix - 1: R-Codes for the Descriptive Statistics and VaR-ES Calculations

Appendix - 2: Normal Distribution: Window - 1 for ARMA(1,1)-GARCH(1,1)
Appendix - 3: Normal Distribution: Window - 2 for ARMA(1,1)-GARCH(1,1)
Appendix - 4: Student's t-distribution: Window - 1 for ARMA(1,1)-GARCH(1,1)
Appendix - 5: Student's t-distribution: Window - 2 for ARMA(1,1)-GARCH(1,1)
Appendix - 6: Normal distribution: Window - 1 for AR(1)-GARCH(1,1)
Appendix - 7: Normal distribution: Window - 2 for AR(1)-GARCH(1,1)
Appendix - 8: Student's t-distribution: Window - 1 for AR(1)-GARCH(1,1)
Appendix - 9: Student's t-distribution: Window - 2 for AR(1)-GARCH(1,1)
iv
1 Introduction
Risk management is aliated with probabilities, statistics and complex processes.
Doesn't matter whether it is a personal choice, project evaluation or managerial de-
cision, risk management is the process of setting the goal in the specic environment,
considering the factors which may aect and analyze them. These steps lead us to
produce dierent models for estimating the risk. However, recent economic crises have
also compelled us to choose the better and improved measurements. There are many
dierent risk measures of which, Value-at-Risk (VaR) is used most frequently.
We start with a simple example. Suppose we have a project and with 99% con-
dence interval we say that we do not lose more than 10 million euro. In other words,
there is a 1% chance that we lose that money. The limit here is 10 million euro which
is the Value-at-Risk barrier and the probability of 1% may vary depending on the
aspects, attributes and riskiness of the project. We certainly have determined the
above statement based on a risk model. The probability indicates that, out of 100
days, there will be only 1 day when the loss will exceed the VaR. Now, say there are
losses which exceed the amount of 10 million euro more than once. Then, we have
underestimated the risk somehow and it requires a model evaluation or backtesting.
Now, backtesting VaR is not a dicult thing to do but the controversy is, VaR
cannot capture the tail risk. VaR certainly determines a limit but what happens when
the loss exceeds the limit? Where will it stop? So the main concern is the exceedance
of losses and VaR cannot determine what happens beyond the barrier.
Figure 1: VaR at 1% signicance level when returns are Normally distributed.
1
This criticism of VaR has opened up the door for another risk measure which is
called Conditional Value-at-Risk (CVaR) or Expected Shortfall (ES). It is the loss
conditional on the loss exceeding the VaR. As Expected Shortfall works on the tail
of a distribution, it gives us a fairly good estimation of expected loss. The BASEL
Committee suggested a substitution of 99% VaR with 97.5% Expected Shortfall but
they are still sticked with Backtesting VaR before Expected Shortfall.
As mentioned above, backtesting of VaR is pretty uncomplicated which is done

by comparing the number of exceedances. However there were no such methods to
backtest the Expected Shortfall because of a major drawback. Unlike VaR, Expected
Shortfall doesn't carry the mathematical property of elicitability (Gneiting 2011).
Elicitability simply means that a risk measure must have a scoring function that com-
pares dierent models which makes VaR elicitable. It is closely connected to model
selection. Although because of non-elicitability, Acerbi and Szekely mentioned that
it is not necessary for Expected Shortfall to be elicitable to do the backtesting.
So the aim of this thesis is to do the Backtesting of Expected Shortfall. We have

chosen the methods proposed by Wong, Acerbi and Szekely, Costanzino and Curran.
We will forecast the out-of-sample data using sGARCH, eGARCH, iGARCH, GJR-
GARCH and Component-GARCH for two windows each of 250 days. So the ultimate
goal would be to compare dierent backtesting methods with dierent GARCH mod-
els. Chapter 2 will formally introduce VaR and Expected Shortfall and their math-
ematical properties along with dierent Parametric assumptions. Chapter 3 will be
describing dierent estimation method of VaR and ES. In chapter 4, a brief description
of the above mentioned GARCH models will be done. Chapter 5 will be describing
the above mentioned backtesting methods. We will implement those methods and
observe the results in Chapter 6 whereas Chapter 7 will be concluding this thesis.
2
2 Risk Measurements
In this chapter, we will discuss dierent risk measure techniques along with their
properties and what makes them so signicant. We will also have a look why there
was a concept of Expected Shortfall not being backtestable.
2.1 Mathematical Properties
Let X be a random variable which denes the price of a portfolio. According to

Artzner et al. (1999), a function is said to be coherent risk measure if it satises
the following properties:
2.1.1 Normalization
(0) = 0
This property tells us that if there are no assets in a portfolio then the portfolio does
not exist and if we do not hold a portfolio then there are no risks involved.
2.1.2 Monotonicity
Let us have two portfolios, X and X 0 . If the value of the portfolio X is larger than X 0
then, the portfolio X will carry less amount of risk compared to the other portfolio.
If
is a linear space and (X ,X 0 ) 2
, for X X 0
(X ) (X 0 )
2.1.3 Subadditivity
This property simply tells us that combining two dierent portfolios always downsize
the amount of risk. This is also referred as the diversication of the portfolio.
(X ) + (X 0 ) (X + X 0 )
2.1.4 Positive Homogeneity

For 0,
(X ) = (X )
If a portfolio needs to be converted to cash, the speed of conversion certainly depends
proportionally to the size of the portfolio. So, an increase or decrease in the size, also
increases or decreases the amount of risks involved.
3
2.1.5 Translation Invariance
Suppose Y is a riskless portfolio which guarantees a certain amount of return y .
Adding this to a risky portfolio X is like adding liquid money which works as a
backup for the risk of the portfolio resulting in minimization of risk.
(X + Y ) = (X ) y
Subadditivity and Positive Homogeneity, put together a new property called convexity.
If (X; X 0 ) 2
and A 2 [0; 1] then,
(AX + (1 A)X 0 ) A(X ) + (1 A)(X 0 )
2.2 Value-at-Risk (VaR)
Let us observe a portfolio of risky assets. If l is the loss distribution function,

then the density function is FL (l) = P (L l). So, the maximum loss would be
inffl 2 R : FL (l) = 1g. We can see that FL (l) has no limit. Hence we consider
condence interval to determine the boundary of FL (l). If we consider VaR at 99%
condence interval, the signicance level, is 1%. If 2 [0; 1], we can dene VaR as:
V aR (Ln+1 ) = inf fl 2 R : P (Ln+1 > l) g

= inf fl 2 R : 1 FL (l) g
= inf fl 2 R : FL (l) 1 g
Here, Ln+1 is the loss at period n + 1. Hence, is the probability of losses being
smaller than VaR:
Pn (Ln+1 V aR (Ln+1 )) =
So VaR not only depends on the condence interval but also on the time horizon. One
of the pillars in BASEL accords II is to prefer V aR1% calculation in case of market
risk. If the function FL is an increasing FL : R ! R, then
FL : R ! R
FL (l) := inf fl 2 R : FL (x) lg
For the cumulative distribution function FL , the generalized inverse FL is also the
quantile function:
q (F ) = FL ()
4
Here, q (F ) is the -quantile of F (McNeil et al. 2005). So we can write the following:
V aR (Ln+1 ) = q (FL )
Now we consider the mathematical properties for Value-at-Risk. VaR does not have
the property of convexity because it does not carry the property of Subadditivity.
Suppose L and L~ are the two losses of two portfolios. When the two losses are
combined, the following does not hold:
V aR (Ln+1 + L~ n+1 ) V aR (Ln+1 ) + V aR (L~ n+1 )
Hence, diversication of portfolio does not play a role in case of VaR.
2.3 Expected Shortfall (ES)
The concept of Expected Shortfall was introduced by Artzner et al. (1997, 1999).
Expected Shortfall is dened as the expectation of loss given that the loss has exceeded
Value-at-Risk. For 2 (0; 1) we can write:
ES (Ln+1 ) = En (Ln+1 jLn+1 V aR (Ln+1 )
The above equation holds if the loss follows a continuous distribution. If VaR is known
at dierent , ES is calculated via:
Z1
1
ES (Ln+1 ) = V aRp (Ln+1 )dp
1
Z1
1
or ES (Ln+1 ) = qp (FL )dp
1
The above is derived from the denition of VaR. This is the average VaR from to
1 (McNeil et al. 2005). As FL is continuous, the condition in the denition is:
P r(L V aR (L)) = 1
Hence, ES is also known as Conditional Value-at-Risk(CVaR), Average Value-at-

Risk(AVaR) or Expected Tail Loss (ETL). ES possesses properties of coherent risk
measure. Though it is derived from VaR, it still has the subadditive property e.g. for
two Portfolio X and Y, the following holds for 2 (0; 1):
ES (X + Y ) ES (X ) + ES (Y )
5
Several proofs can be found at Embrechts et al. (2015). Danielssen (2011) and Artzner
(1999), both described the limitation of VaR, as rst of all, VaR informs us about
the minimum loss but does not tell anything above that level. On the other hand,
VaR looses the subadditive property for heavy tail return distribution (Danielssen et
al. 2013). For our case we will analyse this statement regarding the returns of Dhaka
Stock Exchange Indices later on. In contrast, ES overcomes those limitations of VaR.
But the problem was, ES was not backtestable as, unlike VaR, it is not elicitable
(Gneiting 2011).
2.4 Elicitability
Financial positions and forecasting are usually estimated from the historical data.
Dierent estimation procedures will certainly be producing dierent results. However,
the question is about the eciency. To determine the eciency, comparison is done
over dierent methods. The risk measurement procedures also have to go under the
same process. But this verication process is not well grounded for all the measure-
ments. The measures which are testable by the processes are called elicitable and
Value-at-Risk is one of the them (Ziegel, 2013). According to Gneiting (2011), if Y is
a real valued random variable then a scoring function
S : R R ! [0; 1]
is called strictly consistent for with respect to P if
E (S ((X ); X )) < E (S (r; X ))
A functional is elicitable with respect to P if the above condition holds for the
scoring function. So an optimal forecast x^ for (P )is given by:
x^ = argmax Ep [S (x; Y )]
x
But there exists no such functions like E [L(Y; ES )] = 0. If VaR is also included in
the function, there might be a chance to develop a scoring function for ES. According
to Acerbi et al. (2014), a one parameter family of scoring function
S W (v; e; x) = e2 =2 + W v2 =2 ev + (e(v + x) + W (x2 v2 )=2)(x + v < 0)
for every W 2 R, that jointly elicits VaR and ES,

fV aR; ES g = argmax E [S W (v; e; X )]
v;e
6
This is not yet proven whether the above is true. However, ES is still backtestable
even if it is not elicitable (Acerbi et al. 2014).
2.5 Parametric Assumption
The debate whether returns follow a Normal Distribution, is going on for more than
two decades. Fama (1963) and Mandelbrot (1963) said that the distribution of sums
of IID random variables converge to one of the members of the Lévy-Stable distri-
butions. That distribution would be Normal due to the stochasticity of the returns.
The selection also depends on whether the variance is nite or innite. Depending
on the variance if the returns show a kurtosis larger than 3 then those returns do not
converge to Normal distribution but to some other distributions from the same family.
According to Mandelbrot, returns have indeed nite variance but Normal distribu-
tion underestimates the large returns and other Lévy-stable distributions overestimate
them. That is why we start with the introduction to Normal and Student's t distri-
bution and then have a look how VaR and ES are calculated.
2.5.1 Normal Distribution

The rst thing we do not know about return series is it's distribution. That is the
high time when Normal distribution joins in. Central Limit Theorem tells us that
fairly large number of random variables converge to Normal distribution (Rice 1995).
If denotes the standard Normal density function with
Zx
p1 e 2 dy
2
(x) =
y
1 2
and loss L N (; 2 ) then for 2 (0; 1)
V aR (L) = + 1 ()
By applying the properties of the coherent risk measures,
ES (L) = ES ( + X ) as L N (; 2 )
= + ES (X )
( 1 ())
=+
1
2.5.2 Student's t-distribution
Although Central Limit Theorem tells us about the convergence to Normal dis-
tribution, the convergence speed is comparatively low. Therefore the covergence is
7
barely observed. This is manly because of volatility clustering, a stylized fact of re-
turns, telling us that high volatility will be there tomorrow if there has been high
volatility today (Cont 2001). Volatility makes the return distribution Leptokurtic.
Student's t-distribution also has a kurtosis larger than Normal distribution. So the
density function of Student's t-distribution is given by:
! +1
( v+1 )
v
t2 2
f (t) = p 2 v 1 +
v ( 2 ) v
Here, v is the degree of freedom. If L t(v; ; 2 ) with v degrees of freedom then,
V aR (L) = + fv 1 ()
According to McNeil et al. (2015), ES would be calculated as follows:

!
1 1
tv (f ()) v + (f ()) 2
ES (L) = v v
1 v 1
Here, fv is the density function and tv is the density of Standard t-distribution.
2.6 BASEL III Accords
The Basel Committee on Banking Supervision has issued the third accord in June
2011. After the nancial crisis of 2007, this was of higher demand, to introduce
and change the regulations and minimize any future crisis. The ground objective of
BASEL III was to x a threshold to many dierent indicators of risk in a nancial
institution. The reason behind this move was pretty obvious as this crisis was the
worst one after 1930's great depression and secondly, the weak regulations against
managing risks (Kahn 2008). These resulted in rapid business expansions and less
attention in managing risks e.g. where there were less calculated risk. According
to Gordy et al. (2006), the strict regulations on banks' lending procedure in a way
forced more towards the nancial crisis. One of the key feature of BASEL III accord
is the enhancement of risk coverage. It urged higher capital requirements especially
for mortgage backed securities. It has also raised and upgraded the guidelines of all
the Pillars. Now, banks can have their own set of standards but these should be
according to the regulations of BASEL III. Under the regulations, VaR has to be
computed on a daily basis with a coverage rate of 99%. Besides, a year's data has
to be used as a sample. Banks may also use variance-covariance method, historical
simulation or Monte Carlo simulations to model the risk measurements. It has also
introduced the Stressed Value-at-Risk (SVaR) and Expected Shortfall backtesting.
Expected Shortfall has to be calculated with 97.5% coverage rate.
8
3 Modelling Risk Measures
3.1 Gaussian Method
Gaussian method of Value-at-Risk estimation considers the returns to be Normally

distributed. It's workload is higher than the Historical simulation but less than Monte
Carlo simulation. If fZt gt0 are the risk factors then Xn+1 = Zn+1 Zn is the risk
factor change. So the estimated mean and standard deviation would be:
t
1X
bt = X for k = 1; :::; d
t i=1 t i+1;n
t
b 1 X
kj = (X bk )(Xt i+1;j bj ) for k; j = 1; ::::; d
1 t i=1 t i+1;k
So the VaR and ES of the loss would be:

q
V aR Ln+1 ) =
[ ( (ct + wt0 b) + b wt 1 ()
wt0
q 1
c (Ln+1 ) =
ES 0 b wt ( ())
(ct + wt b) + wt0
1
As, in most of the cases, the daily returns are not Normally distributed, this method
does not bring out an ecient result.
3.2 Historical Simulation
Historical simulation is the widespread method of estimating Value-at-Risk. This

method is not time consuming and easy to implement. Also, a parametric assumption
is also not necessary. It considers the percentiles of the returns to calculate VaR
and ES. For fRt gt0 independent and identically distributed random variable, the
function:
t
1X
Ft (r) = 1 (r)
n i=1 [R ;1) i
is the empirical distribution function of fRt gt0 . The -quantile of Ft for 2 (0; 1)
is:
q (Ft ) =Ft ()
=inf fr 2 R : Ft (r) g
9
So the empirical Expected Shortfall at is:
[t(1X
)]+1
1
ES (Ft ) := Ri:t
[t(1 )] + 1 i=1
Ri:t is the order statistic of the historical return. Historical simulation method needs a
greater number of samples. But it would still respond slowly to a more volatile period
(Jacob, Boudoukh, Richardson and Whitelaw, 1998) because it puts equal weights to
all the returns.
3.3 Monte Carlo Simulation
Monte Carlo method is the most exible method while calculating VaR and ES.
It is implemented by generating random numbers, putting them into the model and
observe the outcome. The precision of the result depends on the generated random
numbers. Higher amount of numbers yield higher precision. The steps of the model
to be implemented are as follows:
Dening a time horizon to compute daily or weekly VaR and ES.
Generating N random numbers from a random number generator.
Updating the VaR and ES calculation process for the rst interval of the time
horizon and so on.
From there, calculating the whether losses exceed the VaR. This step has to be
repeated for all the returns conditioned on the risk factors.
Normality of return is not assumed in Monte Carlo simulation method. It not only
observes the returns objectively but also the facts with biased ideas. So the settings
for estimating VaR and ES calculation is:
V[
aR = qb (Lt+1 ) = l[t(1 )]+1:t
[t(1X
)]+1
c 1
ES = l
[t(1 )] + 1 i=1 i:t
with li:t is the ordered loss operator and Lt+1 is the loss function with
Ln+1 = (Vn+1 Vn ) = l[n] (Xn+1 )
Vt is the portfolio value and Vt = f (tn ; Zn ). Zt are the risk factors.
10
4 Volatility Specication
4.1 ARMA(1,1)
A time series process which considers past values strictly while forecasting are called
the Autoregressive Model. An AR model with lag-1 can be dened as:
rt = 0 + 1 rt 1 + t
is the weight for the lagged values and t is the white noise with t W N (0; 2 ).
0
Mean : E (rt ) =
1 1
2
V ariance : V ar(rt ) =
1 21
We can also relate rt with the past error term t 1 rather than lagged rt . This type
of moving average with lag-1 can be dened as:
rt = 0 + 1 t 1 + t
is the weigth for the lagged value of the shocks/white noise t with t W N (0; 2 ).
Mean : E (rt ) = 0
V ariance : V ar(rt ) = (1 + 02 )2
A more exible model for the time series is the combination of both MA(1) and AR(1).
The process rt is an ARMA(1,1) if following holds:
rt = t 1 + rt 1 + t
The reason behind choosing this model is the presence of error term inside AR(1)
process. So this error term also follows MA(1). Therefore, ARMA(1,1) is more
ecient than AR(1) or MA(1).
4.2 Simple ARCH(1,1)
Suppose rt is the log return at time t. The question arises whether the process frt g
is serially uncorrelated or not. So to capture the volatility clustering, Engle (1982)
11
has introduced Autoregressive Conditional Heteroskedasticity (ARCH) model:
rt =c + t
N (0; 1)
t =t t and t iid
t2 =! + 2t 1 ; ! > 0; > 0
rt is serially uncorrelated but it's dependence can be specied by the ARCH model.
As ARCH model has the same eect on positive and negative shocks, it somehow over-
estimates the volatility and responds very slowly to large variations. Hence, Bollerslev
(1986) proposed an extension known as Generalized ARCH or GARCH.
4.3 Extension to Standard GARCH
For a mean correlated log return rt , the process follows a GARCH(1,1) model if:
N (0; 1)
t =t t and t iid
t2 =! + 2t 1 + t2 1 ; ! > 0; 0; 0 < 1
The parameter measures immediate eect of a shock on the next period's conditional
variance and determines how long the eect of the shock will stay. If + < 1,
there exists covariance stationarity.
4.4 Exponential GARCH
As GARCH has the same weaknesses as ARCH, to overcome those, Nelson (1992)
proposed a model which called Exponential GARCH. The model is specied as follows:
log(t2 ) = ! + t 1 + (jt 1 j E (jt 1 j)) + log(t2 1 )
The conditional variance always stay positive. For a) < 0, stronger eect on negative
shocks, b) = ( = ), eects of only positive (negative) shocks.
4.5 Integrated GARCH
The process of Integrated GARCH's conditional variance is the same as the Simple
GARCH model:
t2 = ! + 2t 1 + t2 1 ;
with the necessary condition of + = 1. For ! = 0, the process f 2 g is a martingale
t
and a random walk without drift. And for ! > 0, it is random walk with drift. So
when ! is martingale, t2 converges to zero.
12
4.6 GJR - GARCH
There is always a strong eect to negative shocks than to positive shocks. So an

asymmetric GARCH is proposed by Glosten, Jagannathan and Runkle in 1993. The
conditional variance process is as follows:
t2 = ! + ( + St 1 )2t 1 + t2 1 ;
where 8
<1; if t 1 < 0
St 1 =
:0; if t 1 0
For, > 0, next period's variance is uncorrelated with previous (specially today's)
return. The process is covariance stationary if the following holds:

+ + <1
2
4.7 Component GARCH
Engle and Lee have introduced the Component version of Simple GARCH in 1999.
This model divides the conditional variance into two parts of less and more persistent.
The model is specied as follows:
t2 = qt + (2t 1 qt 1 ) + (t2 1 qt 1 )
with, qt being the trend of the series,
qt = ! + qt 1 + (2t 1 t2 1 )
The process is covariance stationary if + < 1 and < 1.
13
5 Backtesting Methodologies
This section will explain three dierent backtesting methods of Expected Shortfall.
There have not been many methods published but we will try to nd which method
is more ecient for the data set we have chosen. We denote T as the in-sample data
size of 2000 for the rst window and 2250 for the second window. Rt is the return
at discrete time interval. Backtesting will be implemented on t = 250 which is the
window size. We also let Ft to be the cumulative distribution function which is based
on the certain parametric assumption through which we forecast the data for t. Pt
is the actual return distribution function which is unknown to us. So we calculate
the Value-at-Risk from the forecasted data according to the parametric assumption.
When the real returns are less or equal to the estimated Value-at-Risk, we call it a
violation. So, mathematically:
8
<1; if Rt V aR
t
It+1 =
:0; otherwise
It is the information set indicating the number of VaR violations. Moreover, let:
Xn = fIt Rt jIt = 1g for n = 1; :::; t
Xt is the mean return for the VaR violations for N > 0. So for N violations:
t t
N 1X X
XES = X with N = Ii
t i=1 i i=1
Based on the above assumptions, we explain the backtesting methods very briey in
the following sections.
5.1 Wong's Saddlepoint Approach
In 2008, Wong has suggested a method which needs a parametric assumption but
no simulations to perform the test. The idea is to estimate the probability of the true
mean being less than the observed mean as Expected Shortfall is being calculated with
Normal and Student's t-distribution. If returns follow a standard Normal distribution
then, 1 = q . The scaled distribution of the return is, therefore,
fR (r) = 1 (r) for r q
14
So the appropriate Moment Generating Function (MGF) would like like (the proofs
are derived from Daniels (1987)):
Zq
M (t) = etR 1 (r)dr
1Z
q 1 2
= 1 etR p er =2 dr
1 2
2
= 1 et =2 (q t)
Here, () and () are the PDF and CDF of standard Normal distribution. The
rst and second derivative of the above MGFs are:
M 0 (t) = tM (t) 1 eqt (q)

M 00 (t) = tM 0 (t) + M (t) 1 eqt (q)
Now by applying logarithms to the MGFs we get the Cumulant Generating Function.
The rst order derivative of the Cumulant Function is the saddlepoint:
K (t) = ln(M (t))

M 0 (t)
K 0 (t) =
M (t)
M 00 (t) M 0 (t) 2
00
K (t) =
M (t) M (t)
Using the MGF and rst order MGF we derive t, which is the saddle-point, from the
following equation:
M 0 (t) (q)
K 0 (t) = = t exp(qt t2 =2) = r
M (t) (q t)
After deriving t, according to Luganni and Rice (1980), we then derive:
p
= t NK 00 (t)
p
= sgn(t) 2N (tr K (t))
8
>
>
< 1 for s < 0
>
sgn(s) = 0 for s = 0
>
>
>
:1 for s > 0
15
So, the tail probability would be:
8
<( ) ( ) 1 1 + O(N 3=2 ) for R q
P (R < R ) =
:1 for R > q
5.2 Approaches of Acerbi and Szekely
In 2014, Acerbi and Szekely have proposed three methods for backtesting Expected
Shortfall with slightly distinctive assumptions. The three Z-statistic yield z-score for
the return. The hypotheses are also somewhat similar. We are going to have a look
at them in the following sections:
5.2.1 First Approach

Test Statistic:
t
1X I i Ri
Z1 (R~ ) = + 1 for N > 0
N i=1 ESi
with the null and alternative hypothesis:
H0 : Pi = Fi 8t against
H1 : ESi;P ESi;F 8t and > for some t
V aRi;P = V aRi;F 8t
Fi os the distribution used while forecasting and Pi is the actual return distribution.
5.2.2 Second Approach

Test Statistic:
t
~ 1 X I i Ri
Z2 (R) = + 1 for N > 0
t i=1 ESi
with the null and alternative hypothesis:
H0 : Pi = Fi 8t against
H1 : ESi;P ESi;F 8t and > for some t
V aRi;P V aRi;F 8t
The dierence between Test 1 and 2 has the following relationship:

N
Z2 = 1 (1 Z1 )
t
Note: EH0 [N ] = t.
16
5.2.3 Third Approach
In this method, it is tested whether the observed ranks of the forecasted returns
are iid and they follow Uniform distribution (0; 1). The above assumption holds if
the distribution is properly assumed. This test has to satisfy the if bT c > 1. So the
estimated Expected Shortfall:
bNc
c N (R
~) = 1 X
ES
bNc i=1 Ri:N
Ri:N = ith order statistic. Hence, the test statistic for the third method is as follows:
1 Xt c t (Fi 1 (U~ )
ES
Z3 =
c t (Fi 1 (V~ )]
t i=1 EV [ES
where V~ = fVt g iid

U (0; 1). The denominator is calculated via:
Z1
c t (Fi 1 (Vi )] = t
EV [ES
btc 0 I1 p(t btc; btc)Fi 1(p)dp
Ix (a; b) is the regularized incomplete Beta function. The null and alternative hypothe-
ses for Z3 are:
H0 :Fi = Pi
H1 :Fi Pi 8t and for some t
Note: denotes weak stochastic dominance.
5.3 Costanzino and Curran's Approaches
5.3.1 Spectral Risk Measure

Spectral Risk Measure is a coherent risk measure unlike Value-at-Risk (Acerbi,
2002). It is a process of assigning more weights to the worst outcomes. If X is
a random variable with density FX then according to Acerbi (2002), Spectral Risk
Measure is:
Z1
M = (p)V aR(p)dp
0Z
1
= (p)FX (p)dp
0
17
Here (p) is the Admissible Risk Spectrum which is non-negative, non-increasing
and for 2 L1 ([0; 1]) holds jjjj = 1.
5.3.2 Dening Test Statistic

Costanzino and Curran (2015) have dened Expected Shortfall as a special case of
Spectral Risk Measure with:
1
ES (p) =
f0pg
I
According to the denition of Spectral Risk Measure the following holds:

Z
V ar(p)
MES () = dp
0
The Expected Shortfall failure rate is, therefore,
t Z
N ( ) 1X 1
XES := dp
t i=1 0 L V aR (p)
I i i
So the coverage test for Expected Shortfall is:
N = XbES
N () ()
ES
ZES with
ES ()
4 3
ES () = and ES () =
2 N 12
18
6 Empirical Results
6.1 Data Selection
For the empirical review we have chosen the daily indices of DSE (Dhaka Stock
Exchange). The data has been obtained internally from the oce of Stock Exchange
in Dhaka through Mr. Khairul Alom and Mr. Muhammad Raquib. The time frame
is in between 01.01.2002 and 27.11.2011. For the sake of the research, we have chosen
T=2000 in-sample for the rst out-of-sample window of 250 days and T=2250 for the
second out-of-sample window as the experimental period.
The great recession of 21st century started in United States in 2007 and gradually
spread all over the world economy. It is the utmost surprise that, being no giant
player in the global economy, Bangladesh still managed not to be aected by this
global recession (Islam M.E., Sultana M. and Kamal A.S.M.M., 2013). The recession
ended in 2009 in the US market but the aftermath was still there which somehow
aected countries like Bangladesh after 2009. The stock market faced more volatile
movements in prices. There had been sudden upward movements at a large scale
and then it almost faced a historical low return. Considering the circumstances, the
window period was set in this specic way.
6.2 Analysis of Returns
Observations Low High 1. Quant 3. Quant

2500 -0.0933 0.2038 -0.0052 0.0071
Mean Median Variance Standard Deviation Skewness Kurtosis

0.0007 0.0006 0.0002 0.01447 1.102256 24.4456
Table 1: Descriptive Statistics of Log Returns
Table 1 shows the basic statistics of the log return of 2500 samples. We can see
that the mean and variance are almost closed to zero. It has a positive skewness
meaning that the mass is concentrated to the left. It also shows an excess kurtosis
of 24.4456 which means that the distribution is leptokurtic. In this case, these heavy
tails or extreme events are more likely to happen. This brings us to the goodness-of-t:
Jarque-Bera test over the time series shows us that normality has been signicantly
rejected (Table 2).
19
JB-Test Statistics p-value degree of freedom
62868 0.0000 2
Table 2: Goodness-of-Fit Test
Figure 2 and 3 depict the Indices and the Log Returns of Dhaka Stock Exchange
from 01.01.2002 till 27.11.2011. From a rst look we can say that it is not a stationary
time series as there are some trends, seasonalities and random walks. During the
recession period, the regular ups and downs were there showing no signicant events.
Even in the beginning of 2010, there were stable returns except some high points
including the historical high of 8918.51 points in December 2010. This unforeseen
highs have aected the later part of the series resulting in too many negative returns
or losses.
Figure 2: Indices of DSE
Figure 3: Log Returns of the DSE
The Box-Ljung test has also been performed to test whether there exists sig-
nicant, little or no autocorrelations between dierent lags of returns and squared
20
Statistics (returns) p-value (returns)
10.4770 0.0012
Statistics (squared returns) p-value (squared returns)
97.9200 0.0000
Table 3: Box-Ljung Test
returns. This test refers that in general there is a correlation of zero and even if
there are some non-zero correlations, they are randomly generated (Table 3). That
is why we use ARMA(1,1) for the mean part in forecasting and GARCH(1,1) for the
variance part to t the data. Besides we also use AR(1) for the mean part to check
any kind of misforecasting. Figure 4 clearly shows that it has almost removed the
serial dependence. Moreover, Ljung-Box test did not reject the null hypothesis - No
Serial Dependence with signicant p-value.
Figure 4: Test for Heteroskedasticity
Now for the higher kurtosis, we observe the tail of the returns. We compare the em-
pirical QQ-plot against the normal QQ-plot (Figure 5). It is clearly observable that re-
turns does not follow Normal distribution. The standardized returns, however, still in-
dicates higher kurtosis or fat tails but these returns are more improved while compared
to the line of normal returns. Therefore, we also consider Student's t-distribution
21
while forecasting with the AR(1)-GARCH(1,1) and ARMA(1,1)-GARCH(1,1) model.
Figure 5: QQ Plots of DSE Log Returns and Standardized Log Returns
6.3 Parameter Estimation
Table 4 shows the estimation of dierent parameters of GARCH(1,1) models. These

parameters have been estimated through Maximum Likelihood Estimation (MLE)
method. The standard error of the estimates are mentioned in the parentheses. We
can observe that, except for Exponential GARCH, today's conditional variance is
formed with a key inuence from yesterday's error term. The error terms of Simple
GARCH under both Normal and Student's t-distribution, are covariance stationary
as 1 + 1 < 1. For Exponential GARCH, as j1 j < 1, the conditional variance is
again stationary.
The 1 parameter from GJR-GARCH tells us that today's variance is conditioned

on yesterday's return and it has a negative correlation as 1 > 0 for both Normal and
t-distribution. In case of Integrated GARCH, by looking at the random ! , we can not
conclude that the conditional variance is strictly stationary as ! is slightly greater
than 0. Hence, a pure random walk behavior is not denite. From the Component
GARCH estimation we observe that 1 is less than zero. This indicates that, Besides,
no clear assumptions can be made from the loglikelihood(LL) although it indicates
that the t-distribution ts the data well enough than the Normal distribution but the
dierence is not signicant.
22
Parameters sGnorm sGstd eGnorm eGstd iGnorm iGstd gjrGnorm gjrGstd csGnorm csGstd
! 0.000002 0.000002 -0.309082 -0.352390 0.000002 0.000002 0.000002 0.000002 0.000000 0.000000
(0.000001) (0.000002) (0.070643) (0.094804) (0.000001) (0.000002) (0.000001) (0.000002) (0.000000) (0.000000)
1 0.169290 0.182906 -0.008232 -0.962399 0.169982 0.183916 0.165729 0.169877 0.0145506 0.160180
(0.019504) (0.028340) (0.012786) (0.017224) (0.017902) (0.025357) (0.027069) (0.027961) (0.017943) (0.020866)
1 0.829710 0.816094 0.965487 0.962399 0.830018 0.816084 0.828958 0.813972 0.775781 0.776978
(0.018031) (0.025771) (0.007587) (0.010123) - - (0.018097) (0.025758) (0.037840) (0.033786)
1 - - 0.334781 0.369546 - - 0.008626 0.030302 - -
23
- - (0.028902) (0.042034) - - (0.020652) (0.028294) - -
1;1 - - - - - - - - 0.999744 0.999708
- - - - - - - - (0.000090) (0.000109)
2;1 - - - - - - - - 0.017026 0.016111
- - - - - - - - (0.001923) (0.001601)
LL 6513.437 6568.168 6522.602 6575.012 6513.611 6568.273 6513.526 6568.779 6528.399 6577.814
Table 4: Parameter Estimates of ARMA(1,1)-GARCH(1,1) models.

Normal Distribution
Students t-Distribution
Figure 6: Value-at-Risk for dierent ARMA(1,1)-GARCH(1,1) Models.
6.4 VaR Calculation
In this section, we have calculated the VaR and ES under Gaussian, Historical and
Monte Carlo Method. A brief summary has been presented on Table 5 with a 99%
condence interval ( = 0:01) for VaR. It is clearly observed that the percentage
of violations are higher than . Therefore, an estimation of V aR99% and ES97:5%
has been done over the forecasted value of dierent GARCH model using a rolling
window of 2000 and 2250 days consecutively for the rst and the second window.
We consider the second window to compare the distribution t as the second window
has more volatile part with signicant number of ups and downs (Figure 6). We
can see that Student's t-distribution gives a better t to the estimation. But it also
24
Methods Expected Exceedances Percentage
Gaussian 7.6 36 0.01 4.74
Historical 7.6 29 0.01 3.82
Monte Carlo 7.6 28 0.01 3.68
Table 5: Summary of VaR Calculations
overestimates the signicant highs and lows. When a violation occurs, it recovers
quickly whereas Normal distribution fails to do that. As a result, a lot of violations
have been occurring. Under Student's t, all the GARCH model performs almost
equally: the number of violations are equal but Exponential GARCH quickly catches
up with the return whereas Component GARCH overestimates the loss. The Simple
and Integrated GARCH yields almost the same VaR residuals. But under Normal
distribution, they are signicantly dierent.
6.5 Expected Shortfall: Backtesting Results
In this section, we have shown the dierent p-values for dierent GARCH models
with the assumption of returns following Normal and Student's t-distribution. As
discussed earlier, there are two windows of 250 days each. The second window is
more volatile with large number of losses. VaRs have been calculated on the fore-
casted data with 99% coverage rate and so were Expected Shortfalls with 97:5%. The
condence interval for each of the test statistic is 95%. We observe all the models and
those will give us the best result where the p-values will stay inside the mentioned
condence interval. Under Acerbi-Szekely's rst and second method, p-values have
been generated from 5000 simulations.
Findings from Table 6 - 9

We can observe that the p-values for a single method under dierent GARCH
models are almost the same except for the rst method of Acerbi-Szekely during
the second window under ARMA(1,1) - Component GARCH and on the both
windows for AR(1) - Component and Integrated GARCH(1,1) .
In Wong's saddlepoint technique, there is a linear relationship between the num-

ber of violations and the resulting p-values. Hence, it incurs higher p-values for
higher number of violations and vice versa. In all the cases, we can not reject
the null hypothesis although in most of the cases it is not signicant enough.
Component and Integrated GARCH do display quality forecasting with ARMA

(1,1) although it is not much dierent than AR(1). As a result, it resulted in zero
violations in the rst window under normal distribution for ARMA(1,1). Under
25
t-distribution, Component GARCH yields zero violation in the rst window.
However, having zero violation is not useful for calculating the p-values under
Wong's saddlepoint technique, Acerbi-Szekely's rst and second method as these
do not give us any results which is a necessary condition of N > 0. However,
we do not see such incidents under AR(1)-GARCH(1,1) forecasting.
Under Normal distribution, for the rst window, Wong's method and Acerbi-
Szekely's second and third method do not lead us to reject the null hypothesis
for both of the forecasting methods but the values are not signicant compared
to other p-values. Acerbi-Szekely's rst method, however, results in signicant
values. Costanzino-Curran's method yields higher p-values than other methods
and it is the highest under Integrated GARCH with zero violation. In the
second window, Acerbi-Szekely's methods do not reject the null hypothesis but
the values almost seem to fall outside the condence level whereas Costanzino-
Curran's method signicantly rejected the null hypothesis under all the GARCH
models under both AR(1) and ARMA(1,1).
Under Student's t-distribution, we do not see any grouped violations unlike

we see in case of normal distribution. This means, t-distribution is trying to
cover the fat tail more eciently than Normal distribution. There are indeed
no cases of rejecting the null hypothesis. For ARMA(1,1), Acerbi-Szekely's
rst method is the only method which does not give us higher p-values like
the other methods do. In the second window, we observe that, under all the
GARCH models, the losses of the violations are almost in the same days. In the
rst window, Costanzino-Curran's method gives us the highest p-value under
Component GARCH, whereas, the highest p-value in the second window is under
Integrated GARCH although the number of violations are same under all the
GARCH models.
In general, forecasting data with AR(1)-GARCH(1,1) and ARMA(1,1)-GARCH

(1,1) do not show a signicant dierence both in case of the number of ex-
ceedances and p-values. In most of the cases, AR(1)-GARCH(1,1) yields higher
p-values but we can not conclude properly which is better as the dierences are
really insignicant.
26
Window-1; T=250 days; 27/10/2009-03/11/2010; Condence Interval= 95%; = 2:5%
Methods Violations sGARCH Violations eGARCH Violations gjrGARCH Violations csGARCH Violations iGARCH
Wong 5 0.7002 3 0.6337 5 0.7002 1 0.5033 1 0.5033

AS I 0.5250 0.5134 0.5138 0.9773 0.9761
AS II 0.7917 0.8010 0.8082 0.8770 0.8763
AS III 0.8249 0.8222 0.8225 0.8240 0.8218
CC 0.9816 0.9835 0.9831 0.9849 0.9849
27
Wong 13 0.8341 10 0.7969 8 0.7652 9 0.7820 10 0.7970

AS I 0.4702 0.4683 0.4669 0.4590 0.4516
AS II 0.1971 0.1967 0.1968 0.1920 0.1951
AS III 0.1772 0.1739 0.1747 0.1750 0.1768
CC 0.0234 0.0205 0.0187 0.0201 0.00198
Table 6: P-values for Dierent Methods under AR(1)-GARCH(1,1) with Normal Distribution.
Wong 4 0.6705 3 0.6337 4 0.6705 1 0.5033 4 0.6706

AS I 0.5072 0.5088 0.5099 0.9771 0.9763
AS II 0.8003 0.8005 0.8006 0.8769 0.8764
AS III 0.8195 0.8252 0.8235 0.8215 0.8241
CC 0.9834 0.9835 0.9834 0.9846 0.9834
28
Wong 4 0.6705 5 0.7002 4 0.6705 4 0.6705 4 0.6706

AS I 0.5371 0.5345 0.5366 0.9718 0.9713
AS II 0.8037 0.8034 0.8036 0.8679 0.8676
AS III 0.8279 0.8250 0.8252 0.8240 0.8244
CC 0.9819 0.9808 0.9821 0.9819 0.9818
Table 7: P-values for Dierent Methods under AR(1)-GARCH(1,1) with Student's t-Distribution.
Wong 5 0.7002 3 0.6337 6 0.7250 0 n=a1 0 n=a1

AS I 0.5229 0.5132 0.5112 n=a1 n=a1
AS II 0.7915 0.8010 0.8069 n=a1 n=a1
AS III 0.8254 0.8208 0.8218 0.8215 0.8219
CC 0.9823 0.9835 0.9823 0.9850 0.9852
29
Wong 13 0.8341 10 0.7969 8 0.7652 12 0.8228 9 0.7820

AS I 0.4694 0.4631 0.4664 0.4564 0.4470
AS II 0.1970 0.1964 0.1967 0.2036 0.1945
AS III 0.1747 0.1769 0.1751 0.1736 0.1755
CC 0.0236 0.0208 0.0185 0.0210 0.0201
Table 8: P-values for Dierent Methods under ARMA(1,1)-GARCH(1,1) with Normal Distribution.
(n=a=not available: 1. For zero violation, p-value can not be calculated for the specic method.)
Wong 4 0.6705 3 0.6337 4 0.6705 0 n=a1 4 0.6706

AS I 0.5050 0.5066 0.5073 n=a1 0.9766
AS II 0.8002 0.8003 0.8004 n=a1 0.8765
AS III 0.8225 0.8245 0.8201 0.8226 0.8231
CC 0.9836 0.9836 0.9835 0.9847 0.9836
30
Wong 5 0.7002 5 0.7002 5 0.7002 5 0.7002 5 0.7002

AS I 0.5278 0.5349 0.5287 0.9732 0.5276
AS II 0.8027 0.8035 0.8027 0.8747 0.8027
AS III 0.8255 0.8247 0.8244 0.8264 0.8212
CC 0.9815 0.9803 0.9820 0.9814 0.9816
Table 9: P-values for Dierent Methods under ARMA(1,1)-GARCH(1,1) with Student's t-Distribution.
(n=a=not available: 1. For zero violation, p-value can not be calculated for the specic method.)
7 Conclusion
The major objective of this thesis was to compare dierent methods of backtesting
Expected Shortfall under the assumption of Normal and Student's t-distribution.
Dierent GARCH models have also been used as the variance part of the forecast.
For the Dhaka Stock Exchange Indices, Normal distribution demonstrated clustered
violations a few times whereas student's t-distribution did not show such behavior.
Although Student's t-distribution performed better it is still suggested that skewed
normal or skewed t-distribution should also be applied to t the data because of the
fatter tails.
Now for the ARMA(1,1)-GARCH(1,1) models, considering both windows and dis-
tributions, Integrated GARCH performed better in general. For less volatile period,
GJR-GARCH was the next to provide better results whereas for more volatile pe-
riod it was Component GARCH. Still, we can not conclude which GARCH model
strongly outperformed based on the tests we performed. Under AR(1)-GARCH(1,1),
Component and Integrated GARCH, both seemed to be outperforming other meth-
ods. In general, Costanzino and Curran's method tends to deliver the better result
than other proposed methods. Wong's saddlepoint technique is not recommended
as it still gives higher p-values in case of higher number of violations. The p-values
of Acerbi-Szekely's rst and second methods' also give us an impression of better
performance.
It is not clear to conclude why there were more volatile parts in the second win-
dow after such a long time since the great recession has ended. Hence, these results
would not necessarily t for subsequent returns or VaR/ES calculations. It is, there-
fore, recommended that updated GARCH models should be applied with Student's
t-distribution, furthermore with skewed version of Student's t for dierent numbers
of in-sample data. Besides, Exponential distribution should also be applied for its
'memorylessness' property.
31
Bibliography
Acerbi, C., "Spectral Measures of Risk: A Coherent Representation of Subjective Risk
Aversion", Journal of Banking & Finance, 26(7): 1505-1518, 2002.
Acerbi, C. and Székely, B., "Backtesting Expected Shortfall", Risk, December, 2015.
Acerbi, C. and Tasche, D., "Expected Shortfall: A Natural Coherent Alternative to
Value-at-Risk", Economic Notes, 31(2): 379-388, 2002.
Acerbi, C. and Tasche, D., "On the Coherence of Expected Shortfall", Journal of
Banking & Finance, 26(7): 1487-1503, 2002.
Artzner, P., Delbaen, F., Eber M.J. and Heath, D., "Thinking Coherently", Risk,
Vol-10, No. 11, 68-71, 1997.
Artzner, P., Delbaen, F., Eber M.J. and Heath, D., "Coherent Measures of Risk",
Mathematical Finance, Vol-9, Issue-3, 203-228, 1999.
Basel Committee on Banking Supervision. Fundamental Review of the Trading Book:
Outstanding Issues - Consultative Document. Basle, Switzerland: Bank for Interna-
tional Settlements, 2014.
Bollerslev, T., "Generalized Autoregressive Conditional Heteroskedasticity", Journal
of Econometrics, 31 (3): 307-327, 1986.
Clift, Simon S., Costanzino, N. and Curran, M., "Empirical Performance of Back-
testing Methods for Expected Shortfall", URL: https://ssrn.com/abstract=2618345,
April, 2016 (last accessed 15.11.2016).
Cont, R., "Empirical Properties of Asset Returns: Stylized Facts and Statistical Is-
sues", Quantitative Finance, 1(2):223-236, 1963.
Costanzino, N. and Curran, M., "Backtesting General Spectral Risk Measures with
Application to Expected Shortfall", URL: https://ssrn.com/abstract=2514403, Febru-
ary, 2015 (last accessed 15.11.2016).
Financial Risk Forecasting: The Theory and Practice of Forecasting
Daníelsson, J. ,
Market Risk with Implementation in R and Matlab, John Wiley & Sons, 2011.
Daníelsson, J.,Vries, de G. C., Jorgensen, B. N., Mandira, S. and Samorodnitsky, G.,
"Fat Tails, VaR and Subadditivity", Journal of Econometrics, 172(2), 283-291, 2013.
Du, Z. and Escanciano, Carlos J., "Backtesting Expected Shortfall: Accounting for
Tail Risk", Available at SSRN 2548544, 2015 (last accessed 15.11.2016).
32
Embrechts, P. and Wang, R., "Seven Proofs for the Subadditivity of Expected Short-
fall", Dependence Modeling, 2015.
Engle, R. F., "Autoregressive Conditional Heteroscedasticity with Estimates of Vari-
ance of United Kingdom Ination", Econometrica, 50 (4): 987-1008, 1982.
Engle, R., and Lee, G., "A Permanent and Transitory Component Model of Stock
Return Volatility", R. Engle and H. White (ed.) Cointegration, Causality, and Fore-
casting: A Festschrift in Honor of Clive W. J. Granger, Oxford University Press,
475-497, 1999.
Fama, E. F., "Mandelbrot and The Stable Paretian Hypothesis", The Journal of
Business, 36(4), 420-429, 1963.
Gordy, M. and Howells, B., "Procyclicality in Basel II: Can We Treat the Disease
Without Killing the Patient?", Journal of Financial Intermediation, 15, Issue-3: 395-
417, 2006.
Islam, M.E. and Sultana, M. and Kamal, A.S.M.M., "Economic Recession and Its
Impact on the Bangladesh's Economy with A Special References of RMG and Re-
mittance Sectors", International Journal of Science and Research, Volume 2 Issue 12,
2013.
Kahn, S.D., "Press Release: Letter from IMF Managing Director Dominique Strauss-
Kahn to the G-20 Heads of Governments and Institutions", URL: http://www.imf.org/
external/np/sec/pr/2008/pr08278.htm, 2008.
Lugannani, R. and Rice, S., "Saddlepoint Approximations for the Distribution of
the Sum of Independent Random Variables", Advances in Applied Probability, 12:
475-490, 1991.
Mandelbrot, B., "The Variation of Certain Speculative Prices", The Journal of Busi-
ness, 36(4), 394-419, 1963.
McNeil, A. J., Frey, R. and Embrechts, P., Quantitative Risk Management: Concepts,
Techniques and Tools, Princeton University Press, 2002.
Nelson, D. B., "Conditional Heteroskedasticity in Asset Returns: A New Approach",
Econometrica, 59 (2): 347-370, 1991.
Rice, A. J., Mathematical Statistics and Data Analysis, Cengage Learning, 1995.
Richardson, M., Boudoukh, J. and Whitelaw, R., "The Best of Both Worlds: A Hybrid
Approach to Calculating Value at Risk", Risk, 1998.
Wong, Woin K., "Backtesting Trading Risk of Commercial Banks using Expected
Shortfall", Journal of Banking & Finance, 32(7): 1404-1415, 2008.
Ziegel, F. J., "Coherence and Elicitability", Mathematical Finance, doi:10.1111/ma.1
2080, 2014.
33

Backtesting Expected Shortfall - The Case of The Dhaka Stock Exchange

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Backtesting Expected Shortfall - The Case of The Dhaka Stock Exchange

Enviado por

Direitos autorais:

Formatos disponíveis

Backtesting Expected Shortfall:

The Case of the Dhaka Stock Exchange

Thesis for the degree of Master of Science (MSc)

for the Master's degree programme

in the Institute for Statistics and Econometrics

at the Christian-Albrechts-Universität zu Kiel

First Supervisor: Prof. Dr. Matei Demetrescu

Second Supervisor: Prof. Dr. Markus Haas

The Case of the Dhaka Stock Exchange

3 Modelling Risk Measures 9

Appendix - 1: R-Codes for the Descriptive Statistics and VaR-ES Calculations

Figure 1: VaR at 1% signicance level when returns are Normally distributed.

As mentioned above, backtesting of VaR is pretty uncomplicated which is done

So the aim of this thesis is to do the Backtesting of Expected Shortfall. We have

2.1 Mathematical Properties

Let X be a random variable which denes the price of a portfolio. According to

(X ) + (X 0 )  (X + X 0 )

2.1.4 Positive Homogeneity

(AX + (1 A)X 0 )  A(X ) + (1 A)(X 0 )

2.2 Value-at-Risk (VaR)

Let us observe a portfolio of risky assets. If l is the loss distribution function,

V aR (Ln+1 ) = inf fl 2 R : P (Ln+1 > l)  g

V aR (Ln+1 + L~ n+1 )  V aR (Ln+1 ) + V aR (L~ n+1 )

Hence, diversication of portfolio does not play a role in case of VaR.

2.3 Expected Shortfall (ES)

ES (Ln+1 ) = En (Ln+1 jLn+1  V aR (Ln+1 )

Hence, ES is also known as Conditional Value-at-Risk(CVaR), Average Value-at-

is called strictly consistent for with respect to P if

E (S ((X ); X )) < E (S (r; X ))

S W (v; e; x) = e2 =2 + W v2 =2 ev + (e(v + x) + W (x2 v2 )=2)(x + v < 0)

for every W 2 R, that jointly elicits VaR and ES,

2.5 Parametric Assumption

2.5.1 Normal Distribution

By applying the properties of the coherent risk measures,

Here, v is the degree of freedom. If L  t(v; ;  2 ) with v degrees of freedom then,

According to McNeil et al. (2015), ES would be calculated as follows:

Here, fv is the density function and tv is the density of Standard t-distribution.

2.6 BASEL III Accords

Gaussian method of Value-at-Risk estimation considers the returns to be Normally

So the VaR and ES of the loss would be:

3.2 Historical Simulation

Historical simulation is the widespread method of estimating Value-at-Risk. This

3.3 Monte Carlo Simulation

 Dening a time horizon to compute daily or weekly VaR and ES.

 Generating N random numbers from a random number generator.

Ln+1 = (Vn+1 Vn ) = l[n] (Xn+1 )

Vt is the portfolio value and Vt = f (tn ; Zn ). Zt are the risk factors.

4.2 Simple ARCH(1,1)

4.3 Extension to Standard GARCH

4.4 Exponential GARCH

log(t2 ) = ! + t 1 + (jt 1 j E (jt 1 j)) + log(t2 1 )

4.5 Integrated GARCH

There is always a strong eect to negative shocks than to positive shocks. So an

t2 = ! + ( + St 1 )2t 1 + t2 1 ;

4.7 Component GARCH

t2 = qt + (2t 1 qt 1 ) + (t2 1 qt 1 )

with, qt being the trend of the series,

qt = ! + qt 1 + (2t 1 t2 1 )

The process is covariance stationary if + < 1 and  < 1.

Xn = fIt Rt jIt = 1g for n = 1; :::; t

5.1 Wong's Saddlepoint Approach

fR (r) = 1 (r) for r  q

M 0 (t) = tM (t) 1 eqt (q)

Figure 1: VaR at 1% signicance level when returns are Normally distributed.

Let X be a random variable which denes the price of a portfolio. According to

(X ) + (X 0 ) (X + X 0 )

(AX + (1 A)X 0 ) A(X ) + (1 A)(X 0 )

V aR (Ln+1 ) = inf fl 2 R : P (Ln+1 > l) g

V aR (Ln+1 + L~ n+1 ) V aR (Ln+1 ) + V aR (L~ n+1 )

Hence, diversication of portfolio does not play a role in case of VaR.

ES (Ln+1 ) = En (Ln+1 jLn+1 V aR (Ln+1 )

Here, v is the degree of freedom. If L t(v; ; 2 ) with v degrees of freedom then,

Dening a time horizon to compute daily or weekly VaR and ES.

Generating N random numbers from a random number generator.

log(t2 ) = ! + t 1 + (jt 1 j E (jt 1 j)) + log(t2 1 )

There is always a strong eect to negative shocks than to positive shocks. So an

t2 = ! + ( + St 1 )2t 1 + t2 1 ;

t2 = qt + (2t 1 qt 1 ) + (t2 1 qt 1 )

qt = ! + qt 1 + (2t 1 t2 1 )

The process is covariance stationary if + < 1 and < 1.

fR (r) = 1 (r) for r q

M 0 (t) = tM (t) 1 eqt (q)

The dierence between Test 1 and 2 has the following relationship:

Note: denotes weak stochastic dominance.

5.3.2 Dening Test Statistic

According to the denition of Spectral Risk Measure the following holds:

Table 4 shows the estimation of dierent parameters of GARCH(1,1) models. These

1;1 - - - - - - - - 0.999744 0.999708

2;1 - - - - - - - - 0.017026 0.016111

Figure 6: Value-at-Risk for dierent ARMA(1,1)-GARCH(1,1) Models.

In Wong's saddlepoint technique, there is a linear relationship between the num-

Component and Integrated GARCH do display quality forecasting with ARMA

Under Student's t-distribution, we do not see any grouped violations unlike

In general, forecasting data with AR(1)-GARCH(1,1) and ARMA(1,1)-GARCH