Escolar Documentos
Profissional Documentos
Cultura Documentos
DEHRADUN
Fundamental: Copper prices are determined by a lot of fundamentals like dollar index, copper
consumption, housing index, industrial production, and stock of copper in the world, copper ore
production and import of copper by different countries. From past few years, China and USA have
been the largest importer of copper in the world and imports quantity also has an impact on copper
prices. This study determines the impact of different variables on copper prices using multiple
regression analysis.
In this regression analysis, copper price is the dependant variable and dollar index (DX), China
imports of copper, USA imports of copper, total stock of copper and world consumption of copper
are the independent variables. The variables selected are based on the correlation analysis, the
variables which are least correlated are taken into the analysis as dependant variables.
Data partition: In data partition, the entire data is divided into two partitions, 80% being the
training data, 10% testing and 10% is validating data. After partitioning data, the insight analysis of
the data is done using enterprise miner. The data is checked for the assumptions of linear
regressions like normality, detecting outliers and transformation of variables before putting it to the
regression analysis.
DX
Dollar index
DX
The box plot of dollar index
states that the data is slightly
80 90 100 110
positively skewed. The
DX
Anderson darling test and
0.04 Kolmogorov-Smirnov test
rejects the null hypothesis of
D
e data being normally
n
s
i
0.02 distributed.
t
y
0
72 78 84 90 96 102 108 114 120
The Q-Q plot is also giving
DX
the similar expression that the
data is skewed. Thus
transformation of the data is
required to make it linear and
D
X normally distributed.
-2 0 2
N_ DX_ 1
Mo m e n t s
N 102. 0 0 0 0 Su m Wg t s 102. 0 0 0 0
Me a n 94. 1 7 2 2 Su m 9605. 5 6 1 5
S t d De v 13. 6 3 2 9 Va r i ance 185. 8 5 5 0
S k e wn e s s 0. 3 1 1 9 Ku r t os i s - 1. 0 5 4 7
US S 923347. 9 3 7 CS S 18771. 3 5 1 0
CV 14. 4 7 6 5 St d Me a n 1. 3 4 9 9
The skewness is 0.31. Taking
Qu a n t i l e s
10
7
0
5
%
%
Ma
Q3
x 119
105
.
.
4
9
7
9
0
5
0
0
9
9
9
7
.
.
0
5
%
%
1
1
1
1
9
8
.
.
1
5
0
2
2
2
5
5
the log of the data will
transform it to maximum
5 0 % Me d 89 . 9 9 8 8 9 5 . 0 % 1 1 6 . 8 1 0 0
2 5 % Q1 83 . 4 3 7 5 9 0 . 0 % 1 1 5 . 4 3 5 0
0 % Mi n 72 . 1 7 1 5 1 0 . 0 % 7 7 . 3 8 3 8
Ra
Q3
Mo
nge
- Q1
de
47
22
.
.
.
2
5
9
5
8
7
5
5
5
2
1
.
.
.
0
5
0
%
%
%
7
7
7
4
2
2
.
.
.
8
6
5
7
3
5
5
5
0
3
3
0
normality transformation.
Te s t s f or No r ma l i t y
Te s t St at i s t i c Va l u e p- val ue
Sh a pi r o - Wi l k 0 . 9 4 5 2 6 3 0 . 0 0 0 4
Ko l mo g o r ov- S mi r n o v 0 . 1 2 9 1 7 8 < . 0 1 0 0
Cr a me r - v on Mi s e s 0 . 2 9 0 7 2 8 < . 0 0 5 0
An d er s o n - Da r l i ng 1 . 7 2 0 6 3 3 < . 0 0 5 0
Ch i n a I m
China Imports
By the box plot we can see ChinaIm
that the data is positively
skewed and there are outliers
in the data. Thus outlier filter 100000 200000
Ch i n a I m
300000
0
-30000 150000 330000
Ch i n a I m
C
h
i
n
a
I
Qu a n t i l e s
10 0 % Ma x 3 8 7 9 4 3 . 0 0 0 9 9 . 0 % 3 3 7 2 3 0 . 0 0 0
7 5 % Q3 1 9 1 5 3 6 . 0 0 0 9 7 . 5 % 3 1 7 9 4 7 . 0 0 0
5 0 % Me d 8 6 5 3 3 . 0 0 0 0 9 5 . 0 % 2 4 4 0 1 3 . 0 0 0
2 5 % Q1 5 8 2 4 6 . 0 0 0 0 9 0 . 0 % 1 4 8 6 7 9 . 0 0 0
0 % Mi n 5 5 4 . 0 0 0 0 1 0 . 0 % 1 4 2 0 4 . 0 0 0 0
Ra nge 378 3 8 9 . 0 0 0 5 . 0 % 1 7 1 1 . 0 0 0 0
Q3 - Q1 666 9 0 . 0 0 0 0 2 . 5 % 1 4 9 0 . 0 0 0 0
Mo de 17 1 1 . 0 0 0 0 1 . 0 % 1 1 2 9 . 0 0 0 0
Te s t s f or No r ma l i t y
Te s t St at i s t i c Va l u e p- va l ue
Sh a pi r o - Wi l k 0 . 8 5 2 6 7 8 0 . 0 0 0 0
Ko l mo g o r ov- S mi r n o v 0 . 1 3 6 2 7 3 < . 0 1 0 0
Cr a me r - v on Mi s e s 0 . 5 1 6 8 2 9 < . 0 0 5 0
An d e r s o n - Da r l i ng 3 . 4 9 5 1 1 9 < . 0 0 5 0
Total Stock of Copper
World Consumption
The data is close to normal WRCONS
distribution but not
absolutely normally
distributed. 1200000 1400000
WR C O N S
1600000
0
1040000 1280000 1520000
WR C O N S
W
R
C
O
N
S
hypotheses at 95% Qu a n t i l e s
Prices of Copper
Prices
The data of prices is
positively skewed. Though it
doesn’t have a time trend but
2000 4000
Pr i c e s
6000 8000
it is highly skewed. Taking a
log of the data would
0.0004
transform the data to
maximum normality in this
D
e case.
n
s
i 0.0002
t
y
0
1000 2000 3000 4000 5000 6000 7000 8000 9000
Pr i c e s
P
r The Q-Q plot clearly shows that
i
c
the data is positively skewed and
e is not normally distributed.
s
Though taking the log also is not
able to increase the normality
much.
-2 0 2
N_ P r i c e s _ 1
Mo m e n t s
N 102. 00 0 0 Su m Wg t s 102. 0 0 0 0
Me a n 3664. 24 6 2 Su m 373753 . 1 1 0
S t d De v 2413. 92 3 0 Va r i a nc e 582702 4 . 3 5
S k e wn e s s 0. 82 3 5 Ku r t os i s - 0. 9 1 8 9
US S 1. 958 E+ 0 9 CS S 58852 9 4 5 9
CV 65. 87 7 8 St d Me a n 239. 0 1 4 0
Qu a n t i l e s
10 0 % Ma x 8 4 9 2 . 2 5 0 0 9 9 . 0 % 8 3 4 1 . 2 5 0 0
7 5 % Q3 5 8 4 9 . 7 5 0 0 9 7 . 5 % 8 3 0 9 . 0 0 0 0
5 0 % Me d 2 7 1 4 . 1 2 5 0 9 5 . 0 % 7 9 2 7 . 6 2 5 0
2 5 % Q1 1 7 1 1 . 7 5 0 0 9 0 . 0 % 7 6 9 4 . 2 5 0 0
0 % Mi n 1 4 3 0 . 1 2 5 0 1 0 . 0 % 1 5 1 9 . 5 0 0 0
Ra nge 7 0 6 2 . 1 2 5 0 5 . 0 % 1 4 9 5 . 2 5 0 0
Q3 - Q1 4 1 3 8 . 0 0 0 0 2 . 5 % 1 4 6 8 . 8 7 5 0
Mo de 1 7 6 2 . 0 0 0 0 1 . 0 % 1 4 5 5 . 5 0 0 0
Te s t s f or No r ma l i t y
Te s t St a t i s t i c Va l u e p- va l ue
Sh a pi r o - Wi l k 0 . 7 9 9 9 7 7 0 . 0 0 0 0
Ko l mo g o r ov- S mi r n o v 0 . 2 1 8 9 5 0 < . 0 1 0 0
Cr a me r - v on Mi s e s 1 . 3 6 6 6 3 7 < . 0 0 5 0
An d e r s o n - Da r l i ng 8 . 1 2 9 9 6 0 < . 0 0 5 0
The above table shows the transformation of different variables used in analysis. It can be seen that
after transformation the skewness of data has decreased to a great extent in most of the variables.
After transformation of the variable the outlier filter is run which would remove the outliers in some
of the variables as seen in the distribution analysis.
The regression analysis when run gave results where Dollar Index came out to be having most
impact on the copper prices followed by imports of copper by China. Total consumption of copper
doesn’t clear the t-test here and thus cannot be classified as a variable having impact on the total
stock. Also, fundamentally Chinese imports and dollar index already discounts the impact of total
consumption as USA and China are the largest users of copper.
Time also stands out to be a variable having some impact on copper prices. Total stock of copper
has the least impact on the prices. Overall the regression model came out to explain close to 90% of
variations in the copper prices.
Model Information
Analysis of Variance
Sum of
Source DF Squares Mean Square F Value Pr > F
Standard
Parameter DF Estimate Error t Value Pr > |t|
In order to remove the variable world consumption form the list of independent variables, the
regression analysis was run again with 4 variables and thus the final model is selected. The F-test
gives a value of 218.47 with a p-value which is significantly low, which rejects the null hypothesis
of model not being a good fit for the analysis. Hence we accept the alternate hypothesis of model
being a good fit.
The above table gives the results for the different partitions of the data. And the results are quite
close for average standard error and maximum absolute error. Thus it is concluded that china
imports and dollar index are top 2 variables having an impact on the prices of copper. These
variables are followed by time factor and world total stocks of copper.
10
Series: RESID
Sample 1 127
8 Observations 127
Mean 1.49E-14
6 Median -0.025230
Maximum 0.533643
Minimum -0.385356
4 Std. Dev. 0.202947
Skewness 0.391824
Kurtosis 2.364538
2
Jarque-Bera 5.386476
Probability 0.067662
0
-0.25 0.00 0.25 0.50
Jarque-Bera test for normality of residual: Taking the JB test for the residuals, we test the
normality assumption for residuals. The assumption of normal distribution for residuals is important
and JB test has accepted the null hypothesis of residuals being normally distributed and thus the
assumption holds good.
White heteroskedasticity test for residuals: The white test checks the heteroskedasticity or non-
constant variance in the residual terms. If there is heteroskedasticity, the forecasting becomes a
problem using the model because the error terms keep changing. Thus error terms should lie
between a range. The test takes a null hypothesis of homoskedasticity and the test should accept the
null hypothesis with high p-value and small F-statistic. In this test the assumption holds good for
the homoskedasticity of the residuals.
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 11/04/10 Time: 10:20
Sample: 1 127
Included observations: 127
Newey-West HAC Standard Errors & Covariance (lag truncation=4)
Variable Coefficient Std. Error t-Statistic Prob.
C -83.13552 332.3502 -0.250144 0.8029
CHINA -0.031927 0.020330 -1.570442 0.1191
CHINA^2 -3.55E-07 6.41E-07 -0.554688 0.5802
CHINA*LW 0.002098 0.001367 1.534175 0.1278
CHINA*LDX 0.000573 0.000788 0.726770 0.4689
CHINA*TIME -2.93E-06 5.33E-06 -0.550778 0.5829
LW 10.22270 46.34195 0.220593 0.8258
LW^2 -0.426168 1.641817 -0.259571 0.7957
LW*LDX 0.288200 1.327384 0.217119 0.8285
LW*TIME -0.002887 0.009718 -0.297043 0.7670
LDX 6.001456 18.48112 0.324734 0.7460
LDX^2 -1.069046 0.839763 -1.273033 0.2056
LDX*TIME -0.006501 0.007509 -0.865759 0.3885
TIME 0.072793 0.137160 0.530716 0.5967
TIME^2 -6.93E-06 1.89E-05 -0.367772 0.7137
R-squared 0.162912 Mean dependent var 0.040863
Adjusted R-squared 0.058276 S.D. dependent var 0.047923
S.E. of regression 0.046506 Akaike info criterion -3.187958
Sum squared resid 0.242230 Schwarz criterion -2.852030
Log likelihood 217.4353 F-statistic 1.556945
Durbin-Watson stat 1.147569 Prob(F-statistic) 0.102774