MATH 533 Part C - Regression and Correlation Analysis

MATH 533: Applied Managerial Statistics
Part C: Regression and Correlation Analysis

Using MINITAB perform the regression and correlation analysis for the data on CREDIT
BALANCE (Y) and SIZE (X) by answering the following.
1.
Generate a scatterplot for CREDIT BALANCE vs. SIZE, including the graph of the "best
fit" line. Interpret.
Scatterplot of Credit Balance($) vs Size
6000
Credit Balance($)
5000
4000
3000
2000
1
4
Size
The scatter plot of Credit balance ($) versus Size show that the slope of the best fit line
is upward (positive); this indicates that Credit balance varies directly with Size. As Size
increases, Credit Balance also increases vice versa. Correct
MINITAB OUTPUT:
Regression Analysis: Credit Balance($) versus Size
The regression equation is
Credit Balance($) = 2591 + 403 Size
Predictor
Constant
Size
S = 620.162
Coef
2591.4
403.22
SE Coef
195.1
50.95
R-Sq = 56.6%
Analysis of Variance
T
13.29
7.91
P
0.000
0.000
R-Sq(adj) = 55.7%
Source
Regression
Residual Error
Total
DF
1
48
49
SS
24092210
18460853
42553062
MS
24092210
384601
F
62.64
P
0.000
Predicted Values for New Observations

New Obs
1
Fit
4607.5
SE Fit
119.0
95% CI
(4368.2, 4846.9)
95% PI
(3337.9, 5877.2)
Values of Predictors for New Observations

New Obs
1
Size
5.00
2. Determine the equation of the "best fit" line, which describes the relationship between
CREDIT BALANCE and SIZE.
The equation of the best fit line help describes the relationship between Credit Balance
and Size is
Credit Balance ($) = 2591 + 403.2 Size Correct
3. Determine the coefficient of correlation. Interpret.

The coefficient of correlation is given as r = 0.752. The correlation coefficients between
the variables show a positive sign or direct relationship. The correlation coefficient is far
from the P-Value of 0.000. In this case, a p-value of 0.000 is extremely low. This means
that there is an extremely low chance that Credit Balance and Size results are due to
chance. Correct
MINITAB OUTPUT:
Pearson correlation of Credit Balance ($) and Size = 0.752
P-Value = 0.000
4. Determine the coefficient of determination. Interpret.

The coefficient of determination, R-Sq = 0.566. The proportion of variability in a dataset
that is accounted for by the regression model is given by the coefficient of determination
2
R , which for this regression model is 56.6%. Correct
MINITAB OUTPUT:
S = 620.162
R-Sq = 56.6%
R-Sq(adj) = 55.7%
5. Test the utility of this regression model (use a two tail test with =.05). Interpret your
results, including the p-value.
The null hypothesis; Ho, states that there is no significant correlation, or the correlation
coefficient
=0.
The Significance Level, = 0.05

Decision Rule: Reject Ho, if p-value < 0.05
From the Analysis of Variance table, I find that the p-value is 0.000, which is much less
than 0.05. Therefore, I reject the null hypothesis because there is no significant
correlation and conclude that, according to the overall test of significance, the regression
model is valid. Correct
MINITAB OUTPUT:
Source
Regression
Residual Error
Total
DF
1
48
49
SS
24092210
18460853
42553062
MS
24092210
384601
F
62.64
P
0.000
6. Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT
BALANCE? Explain.
Base on my finding, I see that Size is a good predictor of Credit Balance because Credit
Balance and Size seems to affect each other. As Size increase Credit Balance seems to
increases also; they correlated. As the Size of the household grow so does the Credit
Balance of those household also grew and increase. Correct
7. Compute the 95% confidence interval for . Interpret this interval.

N/A
8. Using an interval, estimate the average credit balance for customers that have household
size of 5. Interpret this interval.
The household size of 5 average credit balances for customers is estimated to lie within
the interval of (4368.2, 4846.9). This is the 95% confidence interval estimate for the
credit balance for customers that have household size of 5. Correct
MINITAB OUTPUT:
New Obs
1
Fit
4607.5
SE Fit
119.0
95% CI
(4368.2, 4846.9)
95% PI
(3337.9, 5877.2)

New Obs
1
Size
5.00
9. Using an interval, predict the credit balance for a customer that has a household size of 5.
Interpret this interval.
The credit balance for a customer that has household size of 5 is expected to lie within
the interval of (3337.9, 5877.2). This is the 95% prediction interval estimate for the credit
balance for a customer that has household size of 5. Correct
MINITAB OUTPUT:
New Obs
1
Fit
4607.5
SE Fit
119.0
95% CI
(4368.2, 4846.9)
95% PI
(3337.9, 5877.2)

New Obs
1
Size
5.00
10. What can we say about the credit balance for a customer that has a household size of 10?
Explain your answer.
We cannot say anything about the credit balance for a customer that has a household size
of 10 because since the maximum value of the predictor variable (size) used to formulate
the given regression model is only 7, which is much less than 10; therefore, we cannot
use the given regression model to accurately estimate the credit balance for a customer
that has a household size of 10. Correct
In an attempt to improve the model, we attempt to do a multiple regression model predicting

CREDIT BALANCE based on INCOME, SIZE and YEARS.
11. Using MINITAB run the multiple regression analysis using the variables INCOME, SIZE
and YEARS to predict CREDIT BALANCE. State the equation for this multiple
regression model.
MINITAB OUTPUT:
Regression Analysis: Credit Balance($ versus Income ($1000), Size, Years
The regression equation is
Credit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years
Predictor
Constant
Income ($1000)
Size
Years
S = 424.715
Coef
1276.0
32.272
346.85
7.88
SE Coef
273.6
4.348
36.03
12.34
R-Sq = 80.5%
T
4.66
7.42
9.63
0.64
P
0.000
0.000
0.000
0.526
R-Sq(adj) = 79.2%
Source
Regression
Residual Error
Total
DF
3
46
49
SS
34255444
8297619
42553062
Source
Income ($1000)
Size
Years
DF
1
1
1
Seq SS
16703393
17478430
73620
MS
11418481
180383
F
63.30
P
0.000
Unusual Observations
Obs
3
5
11
17
Income
($1000)
32.0
31.0
25.0
55.0
Credit
Balance($)
5100.0
1864.0
4208.0
4412.0
Fit
3830.1
3001.7
3210.1
5250.3
SE Fit
93.7
139.3
103.3
116.3
Residual
1269.9
-1137.7
997.9
-838.3
St Resid
3.07R
-2.84R
2.42R
-2.05R
R denotes an observation with a large standardized residual.

The multiple regression equation is:
Credit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years
Correct
12. Perform the Global Test for Utility (F-Test). Explain your conclusion.
The null hypothesis, Ho states that there is no significant correlation, or the correlation
coefficient
=0.
Significance Level, = 0.05

Decision Rule: Reject Ho if p-value < 0.05
From the Analysis of Variance table, we find that the p-value (0.000) is much less than
0.05. Therefore, we reject the null hypothesis that there is no significant correlation and
conclude that, according to the overall test of significance, the multiple regression models
are valid. Correct
MINITAB OUTPUT:
Test for Equal Variances: Credit Balance($) versus Income ($1000)
95% Bonferroni confidence intervals for standard deviations
Income
($1000)
21
22
23
25
26
27
29
30
31
32
33
34
35
37
39
40
41
42
44
46
48
50
51
52
54
55
61
N
2
2
1
1
1
2
1
3
1
1
1
1
1
2
2
1
1
1
1
1
2
2
1
1
3
4
1
Lower
267.855
188.069
*
*
*
101.215
*
123.736
*
*
*
*
*
328.265
276.062
*
*
*
*
*
80.471
259.193
*
*
396.622
290.865
*
StDev
830.85
583.36
*
*
*
313.96
*
309.43
*
*
*
*
*
1018.23
856.31
*
*
*
*
*
249.61
803.98
*
*
991.86
647.76
*
Upper
344720
242037
*
*
*
130260
*
7053
*
*
*
*
*
422465
355281
*
*
*
*
*
103563
333571
*
*
22607
5780
*
62
63
64
65
66
67
2
1
1
1
2
2
221.807
*
*
*
87.765
70.212
688.01
*
*
*
272.24
217.79
285457
*
*
*
112951
90361
Bartlett's Test (Normal Distribution)

Test statistic = 5.59, p-value = 0.935
Levene's Test (Any Continuous Distribution)
Test for Equal Variances: Credit Balance($) versus Size

Size
1
2
3
4
5
6
7
N
5
15
8
9
5
5
3
Lower
137.540
459.836
193.542
415.251
340.696
360.277
150.085
StDev
271.807
698.998
336.323
701.689
673.284
711.981
356.267
Upper
1303.27
1337.23
943.85
1796.00
3228.28
3413.83
5956.16

Test for Equal Variances: Credit Balance($) versus Years

Years
1
2
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
N
2
1
2
2
1
2
1
2
2
4
4
4
5
3
4
2
5
2
2
Lower
541.930
*
452.950
130.788
*
78.920
*
76.013
135.483
204.115
348.641
167.957
584.321
232.333
231.705
111.114
452.721
121.398
540.589
StDev
1714.03
*
1432.60
413.66
*
249.61
*
240.42
428.51
461.26
787.86
379.55
1221.32
590.58
523.61
351.43
946.25
383.96
1709.78
Upper
875261
*
731550
211232
*
127462
*
122768
218815
4413
7538
3631
7236
14935
5010
179457
5607
196067
873094

Conclusion is that since all the p-value of the Bartletts Test (Normal Distribution) is
greater than 0.05, I am unable to reject the null hypothesis. Levenes Test does not
assume Normality and also fails to reject the null hypothesis of equal variance.
13. Perform the t-test on each independent variable. Explain your conclusions and clearly
state how you should proceed. In particular, which independent variables should we keep
and which should be discarded.
Test the significance for the individual coefficients of the independent variables.
The null hypothesis, Ho states that there is no significant correlation, or the correlation
coefficient p = 0.
Decision Rule: Reject Ho if p-value <0.05
MINITAB OUT:
Income ($1000)
Source
Regression
Residual Error
Total
DF
1
48
49
SS
16703393
25849669
42553062
MS
16703393
538535
F
31.02
P
0.000
Year
Source
Regression
Residual Error
Total
DF
1
48
49
SS
2878
42550184
42553062
MS
2878
886462
F
0.00
P
0.955
Size
Source
Regression
Residual Error
Total
DF
1
48
49
SS
24092210
18460853
42553062
MS
24092210
384601
F
62.64
P
0.000
The independent variables of Income ($1000) and Size should kept because they have a
significant contribution in the regression model, but variable Years should be discarded
because it does not have a significant contribution in the regression model. Correct
14. Is this multiple regression model better than the linear model that we generated in parts 110? Explain.
The proportion of variability in a dataset that is accounted for is given by the coefficient
of determination r-square. Thus, the higher the value of r-square, the better is the
regression model. The value of r-square is greater for the multiple regression model
(0.805) as compared to that of the linear regression model (0.566) and hence the multiple
regression model is better than the linear regression model. Correct
Project Part C: Grading Rubric
Category
Points Your
Description
Value Points
Questions 1 - 12 and 14 - 5
pts. each. Everyone gets
credit for No. 7
65
65
addressed with appropriate

output, graphs and
interpretations
Question 13
15
15
addressed with appropriate

output, graphs and
interpretations
Summary
20
20
writing, grammar, clarity, logic,

and cohesiveness
Total
100
100
A quality paper will meet or

exceed all of the above
requirements.

MATH 533 Part C - Regression and Correlation Analysis

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

MATH 533 Part C - Regression and Correlation Analysis

Enviado por

Direitos autorais:

Formatos disponíveis

MATH 533: Applied Managerial Statistics

Part C: Regression and Correlation Analysis

Predicted Values for New Observations

Values of Predictors for New Observations

3. Determine the coefficient of correlation. Interpret.

4. Determine the coefficient of determination. Interpret.

The Significance Level, = 0.05

7. Compute the 95% confidence interval for . Interpret this interval.

Values of Predictors for New Observations

Values of Predictors for New Observations

In an attempt to improve the model, we attempt to do a multiple regression model predicting

R denotes an observation with a large standardized residual.

Significance Level, = 0.05

Bartlett's Test (Normal Distribution)

Test for Equal Variances: Credit Balance($) versus Size

Bartlett's Test (Normal Distribution)

Test for Equal Variances: Credit Balance($) versus Years

Bartlett's Test (Normal Distribution)

addressed with appropriate

addressed with appropriate

writing, grammar, clarity, logic,

A quality paper will meet or

Você também pode gostar