Pa Cia 2

Problem statement: To identify customer who will be retained by the telecom
operator and determining churn
Predictor variables
Dependent Variable
Account length
Vmail message
VMail Message
Day Mins
Eve Mins
Night Mins
Intl Mins
CustServ Calls
Day Calls
Day Charge
Eve Calls
Eve Charge
Night Calls
Night Charge
Intl Calls
Intl Charge
State
Area Code
CHURN
Non predictor: phone number
2) To begin with Multi co linearity test data should be binned

Unbinned variables are
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
Account Length
3333
243
101.06
39.822
VMail Message
3333
51
8.10
13.688
Day Mins
3333
.0
350.8
179.775
54.4674
Eve Mins
3333
.0
363.7
200.980
50.7138
Night Mins
3333
23.2
395.0
200.872
50.5738
Intl Mins
3333
20
10.24
2.792
Day Calls
3333
165
100.44
20.069
Day Charge
3333
.00
59.64
30.5623
9.25943
Eve Calls
3333
170
100.11
19.923
Eve Charge
3333
.00
30.91
17.0835
4.31067
Night Calls
3333
33
175
100.11
19.569
Night Charge
3333
1.04
17.77
9.0393
2.27587
Intl Charge
3333
.0
5.4
2.765
.7538
Area Code
3333
408
510
437.18
42.371
Valid N (listwise)
3333
Multicolinearity on binned data is shown below
From the above table VIF for all the variable is less than 10 which means no
variables can be dropped
Decision tree
KMO value
KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy.
Approx. Chi-Square
Bartlett's Test of Sphericity
.510
7421.300
df
120
Sig.
.000
Split 70:30 Method Chaid

Classification
Sample
Observed
Predicted
0
Training
Percent Correct
1966
48
97.6%
246
83
25.2%
94.4%
5.6%
87.5%
805
31
96.3%
129
25
16.2%
94.3%
5.7%
83.8%
Overall Percentage
Test
Overall Percentage
Growing Method: CHAID
Dependent Variable: Churn
Training accuracy is 87.5 and testing is 83.8 means any new sample fed into this model will
be 83.3% accurate
Split 70:30 Method CRT
Classification
Sample
Observed
Predicted
0
Training
Percent Correct
1983
69
96.6%
172
177
50.7%
89.8%
10.2%
90.0%
767
31
96.1%
62
72
53.7%
88.9%
11.1%
90.0%
Overall Percentage
Test
Overall Percentage
Growing Method: CRT
Training accuracy is 90% and testing is 90% means any new sample fed into this model will
be 90% accurate
Split 60:30 Method CHAID
Classification
Sample
Observed
Predicted
0
Training
Percent Correct
1703
32
98.2%
193
86
30.8%
94.1%
5.9%
88.8%
1095
20
98.2%
147
57
27.9%
94.2%
5.8%
87.3%
Overall Percentage
Test
Overall Percentage
Classification
Sample
Observed
Predicted
0
Training
Percent Correct
1542
142
91.6%
117
172
59.5%
84.1%
15.9%
86.9%
1078
88
92.5%
92
102
52.6%
86.0%
14.0%
86.8%
Overall Percentage
Test
Overall Percentage
Growing Method: CRT
Cross validation CRT

Classification
Observed
Predicted
0
Percent Correct
2750
100
96.5%
234
249
51.6%
89.5%
10.5%
90.0%
Overall Percentage
Growing Method: CRT

Cross validation CHAID

Classification
Observed
Predicted
0
Percent Correct
2698
152
94.7%
241
242
50.1%
88.2%
11.8%
88.2%
Overall Percentage
After analysing above confusion matrix

5) Tubular format is drawn shown below
Method with CRT and spilt 70:30 will give the accuracy of 90 and testing also 90
Best method is CRT with spilt 70:30
Logistic regression
Classification Tablea
Observed
Predicted
Churn
0
Step 1
Churn
2759
Percentage
Correct
1
30
98.9
436
42
8.8
Overall Percentage
85.7
a. The cut value is .500
Regression is 85.7 % which is good enough measure
Omnibus Tests of Model Coefficients

Chi-square
Step 1
df
Sig.
Step
357.955
15
.000
Block
357.955
15
.000
Model
357.955
15
.000
Sig if less than 0.05 which means data variables have impact on churn
Model Summary
Step
-2 Log likelihood
2361.871
Cox & Snell R
Nagelkerke R
Square
Square
.104
.184
a. Estimation terminated at iteration number 6 because

parameter estimates changed by less than .001.
Neural network
Classification
Sample
Observed
Predicted
0
Percent Correct
Training
1933
53
97.3%
251
77
23.5%
94.4%
5.6%
86.9%
774
29
96.4%
108
41
27.5%
92.6%
7.4%
85.6%
Overall Percent
Testing
Overall Percent
Area Under the Curve

Area
Churn
.788
.788
From the above table and ROC curve
Accuracy is 86.9 for testing and 85.6 for testing

From roc curve
Curve is above benchmark line and can be accepted for testing future sample
data o determine churn
Performance evaluation
Decision tree accuracy is 90 with split 70:30 CRT method

Logistic regression is 85.7 % accurate
Neural network is 86.9 % accurate
Conclusion:
From the above analysis
Decision tree with 70:30 split CRT is best technique
This method will help in determining the churn depending on various factors in
future

Pa Cia 2

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Pa Cia 2

Enviado por

Direitos autorais:

Formatos disponíveis

Problem statement: To identify customer who will be retained by the telecom

operator and determining churn

Non predictor: phone number

2) To begin with Multi co linearity test data should be binned

Multicolinearity on binned data is shown below

KMO and Bartlett's Test

Split 70:30 Method Chaid

Split 60:30 Method CHAID

Cross validation CRT

Growing Method: CRT

Cross validation CHAID

After analysing above confusion matrix

a. The cut value is .500

Regression is 85.7 % which is good enough measure

Omnibus Tests of Model Coefficients

Cox & Snell R

a. Estimation terminated at iteration number 6 because

Area Under the Curve

From the above table and ROC curve

Accuracy is 86.9 for testing and 85.6 for testing

Decision tree accuracy is 90 with split 70:30 CRT method

Você também pode gostar