Eda Mca

Introduction MCA through the example Advanced comments Clustering
Multiple Correspondence Analysis

Julie Josse, Franois Husson, Sbastien L
Applied Mathematics Department, Agrocampus Ouest
useR-2008
Dortmund, August 11th 2008
1 / 32
MCA deals with which kind of data?
MCA deals with categorical variables, but continuous variables
can also be included in the analysis
Many examples (almost in survey) and today we illustrate
MCA with:
Questionnaire: tea consumers habits
Ecological data
2 / 32
Multiple Correspondence Analysis
Generalization of PCA, Generalization of CA
Analyse the pattern of relationships of several categorical
variables
Dimensionality reduction, sum-up a data table
Factorial Analysis: data visualisation with a lot of graphical
representations to represent proximities between individuals and
proximities between variables
Pre-processing: MCA before clustering
3 / 32
Tea data
300 individuals
3 kinds of variables:
the way you drink tea (18 questions): kind of tea drunk? How
do you drink your tea: with lemon, milk?
the products perception (12 questions): is tea good for
health? is it stimulating?
personal details (4 questions): sex, age
4 / 32
Problems - objectives
Individuals study: similarity between individuals (for all the
variables) partition between individuals.
Individuals are dierent if they dont take the same levels
Variables study: nd some synthetic variables (continuous
variables that sum up categorical variables); link between
variables levels study
Categories study:
two levels of dierent variables are similar if individuals that
take these levels are the same (ex: 65 years and retire)
two levels are similar if individuals taking these levels behave
the same way, they take the same levels for the other variables
(ex: 60 years and 65 years)
Link between these studies: characterization of the groups of
individuals by the levels (ex: executive dynamic women)
5 / 32
Indicator matrix
Binary coding of the factors: a factor with K
j
levels K
j
columns containing binary values, also called dummy variables
1
0 1 0 0 0
0 0 1 0
variable 1
variable j variable J
i
n
d
i
v
i
d
u
a
l
s
6 / 32
Burt matrix
This matrix concatenates all two-way cross-tabulations
between pairs of variables
The analysis of the Burt matrix only gives results for the levels
variable j variable l
v
a
r
i
a
b
l
e

j
v
a
r
i
a
b
l
e

l
7 / 32
History
At the beginning, when Correspondence Analysis algorithms were
available, someone has the idea to use theses algorithms on the In-
dicator Matrix! You could see the Indicator Matrix (with a lot of
imagination!) as a contingency table which cross two categorical
variables. This strategy leads to very interesting results: that how is
born Multiple Correspondence Analysis. (Lebart)
8 / 32
Construction of the cloud of individuals
We need a distance between individuals:
Two individuals take the same levels: distance = 0
Two individuals take all the categories except one which is
uncommon: we want to put it far away
Two individuals have in common a rare level: they should be
closed even if they take dierent levels for the other variables
d
2
i ,i
=
I
J
K
k=1
1
I
k
(x
ik
x
i
k
)
2
=
K
k=1
1
I
k
/(IJ)
x
ik
/(IJ)
1/I

x
i
k
/(IJ)
1/I
2
d
2(row prole i , row prole i
) =
J
j =1
1
f
j
f
ij
f
i
f
i
j
f
i
2
9 / 32
Construction of the cloud of levels
We need a distance between levels:
Two levels are closed if there is a lot of common individuals
which take these levels
Rare levels are far away from the others
d
2
k,k
= I
I
i =1
x
ik
I
k
x
ik
I
k
2
=
I
i =1
1
1/I
x
ik
/(IJ)
I
k
/(IJ)

x
ik
/(IJ)
I
k
/(IJ)
2
d
2(column prole j , column prole j
) =
I
i =1
1
f
i
f
ij
f
j
f
ij
f
j
2
10 / 32
History (later)
MCA has been (re)discovered many times and could be seen under
dierent points of view:
PCA on a particular data table with particular weights for the
variables
CA on the Indicator Matrix
CA on the Burt Table
MCA is also known under several dierent names such as
homogeneity analysis
11 / 32
look at your data
library(FactoMineR)
data(tea)
summary(tea)
par(ask=T)
for (i in 1:ncol(tea)) barplot(table(tea[,i]))
Inertia of category k =
1
J
1
I
k
I
How to deal with rare levels?

Delete individuals (not a good idea!)
Group levels
"Ventilation": allocate at random
12 / 32
Dene active variables
18 variables 4 variables 12 variables
Which kind of
tea do you
drink?
Who
are
you?
Which
adjectives do
you associate
to the tea?
Active variables: the way you drink tea
Supplementary: the others
How to deal with continuous variable?
Supplementary information: projected on the dimensions and
calculate the correlation with each dimension
Active Information: cut the variables in classes.
res.mca=MCA(tea,quanti.sup=19,quali.sup=20:36)
13 / 32
Graph of the individuals
plot(res.mca,invisible=c("var","quali.sup","quanti.sup"))
Distance between individual i and
the barycenter:
d(x
i .
, g)
2
=
I
J
J
j =1
K
k=1
x
ik
I
k
1
Distance between individuals:
d
(
x
i .
, x
i
.
)
2
=
I
J
K
k=1
(x
ik
x
i
k
)
2
I
k
0.5 0.0 0.5 1.0 1.5
0
.
5
0
.
0
0
.
5
1
.
0
MCA factor map
Dim 1 (9.885%)
D
i
m

2

(
8
.
1
0
3
%
)
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G G G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30
31
32
33
34
35
36
37 38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82 83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106 107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139 140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200 201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225 226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246 247
248
249
250
251
252
253
254
255
256
257 258 259
260
261
262
263 264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279 280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
14 / 32
Graph of the active levels
plot(res.mca,invisible=c("ind","quali.sup","quanti.sup"))
Distance between levels and the
barycenter:
d(x
.k
, g)
2
=
I
I
k
1
Distance between levels:
d(x
.k
, x
.k
)
2
= I
I
i =1
x
ik
I
k
x
ik
I
k
2
=
nb of ind taking only one level
I
k
I
k
1 0 1 2
1
0
1
2
MCA factor map
Dim 1 (9.885%)
D
i
m

2

(
8
.
1
0
3
%
)
breakfast
Not.breakfast
Not.tea time
tea time evening
Not.evening
lunch
Not.lunch
dinner
Not.dinner
always
Not.always
home
Not.home
Not.work
work
Not.tearoom
tearoom
friends
Not.friends
Not.resto
resto
Not.pub
pub
black
Earl Grey
green
alone
lemon
milk
other
No.sugar
sugar
tea bag
tea bag+unpackaged
unpackaged
chain store
chain store+tea shop
tea shop
p_branded
p_cheap
p_private label
p_unknown
p_upscale
p_variable
15 / 32
Transition Formulae
F
s
(i ) =
1
k
x
ik
J
G
s
(k)
Individual i is (up to
1
s
) at the barycenter its levels
G
s
(k) =
1
i
x
ik
I
k
F
s
(i )
Level k is (up to
1
s
) at the barycenter of the individuals who
take this level
Possibility to simultaneously represent the two representations
16 / 32
Superimposed representation
plot(res.mca,invisible=c("quali.sup","quanti.sup"))
1 0 1 2
1
0
1
2
MCA factor map
Dim 1 (9.885%)
D
i
m

2

(
8
.
1
0
3
%
)
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G G G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30
31
32
33
34
35
36
37 38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63 64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82 83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106 107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123 124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139 140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200 201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225 226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246 247
248
249
250
251
252
253
254
255
256
257 258 259
260
261
262
263 264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279 280
281 282
283
284
285
286
287
288
289
290
291
292
293
294
295 296
297
298
299
300 breakfast
Not.breakfast
Not.tea time
tea time evening
Not.evening
lunch
Not.lunch
dinner
Not.dinner
always
Not.always
home
Not.home
Not.work
work
Not.tearoom
tearoom
friends
Not.friends
Not.resto
resto
Not.pub
pub
black
Earl Grey
green
alone
lemon
milk
other
No.sugar
sugar
tea bag
tea bag+unpackaged
unpackaged
chain store
tea shop
p_branded
p_cheap
p_private label
p_unknown
p_upscale
p_variable
17 / 32
Interpretation of the location of the levels
plot(res.mca$ind$coord[,1:2],col=as.numeric(tea[,17]),pch=20)
legend("topright",legend=levels(tea[,17]),text.col=1:3,col=1:3)
aa=by(res.mca$ind$coord,tea[,17],FUN=mean)[[3]][1:2]
points(aa[1],aa[2],col=3,pch=15,cex=1.5)
x11()
plot(res.mca$ind$coord[,1:2],col=as.numeric(tea[,18]),pch=20)
legend("topright",legend=levels(tea[,18]),text.col=1:6,col=1:6)
bb=by(res.mca$ind$coord,tea[,18],FUN=mean)[[5]][1:2]
points(bb[1],bb[2],col=5,pch=15,cex=1.5)
18 / 32
Interpretation of the location of the levels
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G G G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
0.5 0.0 0.5 1.0
0
.
5
0
.
0
0
.
5
1
.
0
Dim 1
D
i
m

2
G
G
G
chain store
tea shop
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G G G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
0.5 0.0 0.5 1.0
0
.
5
0
.
0
0
.
5
1
.
0
Dim 1
D
i
m

2
G
G
G
G
G
G
p_branded
p_cheap
p_private label
p_unknown
p_upscale
p_variable
19 / 32
Supplementary levels representation
plot(res.mca,invisible=c("var","ind","quanti.sup"),cex=0.8)
0.6 0.4 0.2 0.0 0.2 0.4
0
.
4
0
.
2
0
.
0
0
.
2
0
.
4
MCA factor map
Dim 1 (9.885%)
D
i
m

2

(
8
.
1
0
3
%
)
F
M
employee
middle
nonworker
other worker
senior
student
workman
Not.sportsman
sportsman
1524
2534
3544
4559
+60
1/day
1 to 2/week
+2/day
3 to 6/week
escapeexoticism
Not.escapeexoticism
Not.spirituality
spirituality
healthy
Not.healthy
diuretic
Not.diuretic
friendliness
Not.friendliness
iron absorption
Not.iron absorption
feminine
Not.feminine
Not.sophisticated
sophisticated
No.slimming
slimming
exciting
No.exciting
No.relaxing
relaxing
effect on health
No.effect on health
20 / 32
Continuous supplementary variable
plot(res.mca,invisible=c("var","ind","quali.sup"),cex=0.8)
1.0 0.5 0.0 0.5 1.0
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
Supplementary variables on the MCA factor map
Dim 1 (9.885%)
D
i
m

2

(
8
.
1
0
3
%
)
age
21 / 32
Description of the dimensions
By qualitative variables (F-test), categories (t-test) and
quantitative variables (correlation)
dimdesc(res.mca)
$Dim 1$quali $Dim 1$category
P-value Estimate P-value
where 1.255462e-35 tearoom 0.2973107 6.082138e-32
tearoom 6.082138e-32 chain store+tea shop 0.3385378 1.755544e-25
how 1.273180e-23 friends 0.1995083 8.616289e-20
friends 8.616289e-20 resto 0.2080260 2.319804e-18
resto 2.319804e-18 tea time 0.1701136 1.652462e-15
tea.time 1.652462e-15 tea bag+unpackaged 0.2345703 6.851637e-14
price 4.050469e-14 pub 0.1813713 5.846592e-12
pub 5.846592e-12 work 0.1417041 3.000872e-09
work 3.000872e-09 Not.work -0.1417041 3.000872e-09
How 4.796010e-07 green -0.2456910 6.935593e-10
Tea 8.970954e-07 Not.pub -0.1813713 5.846592e-12
lunch 1.570629e-06 Not.tea time -0.1701136 1.652462e-15
frequency 1.849071e-06 tea bag -0.2318245 3.979797e-16
friendliness 2.706357e-06 Not.resto -0.2080260 2.319804e-18
evening 5.586801e-05 chain store -0.2401244 1.254499e-18
Not.friends -0.1995083 8.616289e-20
Not.tearoom -0.2973107 6.082138e-32
22 / 32
Inertia
Variable Inertia:
Inertia (j ) =
K
j
k=1
Inertia (k) =
K
j
k=1
1
J
1
I
k
I
=
K
j
1
J
The inertia is large when the variable has many levels
Remark: should we use variables with equal number of levels?
No: it doesnt matter because the projected inertia of each
variable on each axis is bounded by 1/J.
Total Inertia:
total inertia =
K
J
1
23 / 32
Choice of the number of dimensions to interpret
res.mca$eig
Bar plot: dicult to make a choice
K J non-null eigenvalues,

s
=
K
J
1
Average of an eigenvalue:
1
KJ

s

s
=
1
J
a rule consists
to keep the eigenvalues greater than 1/J
Bootstrap condence interval
24 / 32
Why percentages of inertia are small?
Individuals are in R
KJ
Generally, the percentage of
variance explained by the rst axis is small
Maximum percentage for one dimension:
s
100
1
KJ
J
100
J
K J
100
With K = 100, J = 10:
s
11.1 %
aa=as.factor(rep(1:10,each=100))
bb=cbind.data.frame(aa,aa,aa,aa,aa,aa,aa,aa,aa,aa)
colnames(bb)=paste("a",1:10,sep="")
res=MCA(bb)
res$eig[1:10,]
25 / 32
Why the percentages of inertia are small?
Moreover, the percentages are pessimistic!
If the percentages of inertia are calculated from the Burt table
analysis:
burt=t(tab.disjonctif(tea[,1:18]))%*%tab.disjonctif(tea[,1:18])
res.burt=CA(burt)
res.burt$eig[1:10,]
34.8 % explained by the two rst axes instead of 19 % (with exactly the
same representations for the levels)
Benzcri and Greenacre noticed that this percentage is optimistic and
proposed coecient to adjust the inertia
26 / 32
Helps to interpret
Contribution and cos
2
for the individuals and the levels
res.mca$ind$contrib
res.mca$ind$cos2
res.mca$var$contrib
res.mca$var$cos2
Extreme levels do not necessarily contribute the most (it depends
on the frequencies)
Cos
2
are very small... but it was awaited since inertia is small
Variable contribution: CTR(j ) =
k
CTR(k)
Remark:
2
(F
s
, j ) =
CTR(j )
J
s
27 / 32
Remarks
Return to your data with contingency tables
correspondence analysis
Non-linear relationships can be highlighted
Gutman eect
MCA as a pre-processing for clustering
28 / 32
A pre-processing for Clustering
Transformation of the data: categorical continuous
Principal components (individuals coordinates) are synthetic
variables: the most linked to the other variables:
argmax
v
1
J
j

2
(v, j ) = F
s
"Denoising": retain only 95% of the information
Clustering on the individuals coordinates (with variance
s
)
hierarchical clustering with ward criteria (based on inertia)
Classication (Fisher Linear Discriminant) on the individuals
coordinates (with variance
s
)
29 / 32
Hierarchical Clustering
res.mca=MCA(tea,quanti.sup=19,quali.sup=20:36,ncp=20,graph=F)
library(cluster)
classif = agnes(res.mca$ind$coord,method="ward")
plot(classif,main="Dendrogram",ask=F,which.plots=2,labels=FALSE)
0
2
4
6
Dendrogram
Agglomerative Coefficient = 0.9
res.mca$ind$coord
H
e
i
g
h
t
30 / 32
Represent the clusters on your factorial map
clust = cutree(classif,k=3)
tea.comp = cbind.data.frame(tea,res.mca$ind$coord[,1:3],factor(clust))
res.aux=MCA(tea.comp,quanti.sup=c(19,37:39),quali.sup=c(20:36,40),graph=F)
plot(res.aux,invisible=c("quali.sup","var","quanti.sup"),habillage=40)
0.5 0.0 0.5 1.0 1.5
0
.
5
0
.
0
0
.
5
1
.
0
MCA factor map
Dim 1 (9.885%)
D
i
m

2

(
8
.
1
0
3
%
)
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G G G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30
31
32
33
34
35
36
37 38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82 83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106 107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139 140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200 201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225 226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246 247
248
249
250
251
252
253
254
255
256
257 258 259
260
261
262
263 264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279 280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
1
2
3
31 / 32
Describe each cluster
catdes(tea.comp,ncol(tea.comp))
$test.chi
P.value df P.value df
where 2.316552e-49 4 tearoom 1.025632e-09 2
how 3.592323e-35 4 dinner 3.874810e-09 2
price 1.142914e-31 10 friends 1.859075e-06 2
How 5.884403e-10 6
$category$2
Cla/Mod Mod/Cla Global p.value V-test
price=p_upscale 0.73584906 0.6610169 0.1766667 1.467589e-22 9.702737
where=tea shop 0.90000000 0.4576271 0.1000000 6.532117e-19 8.805180
how=unpackaged 0.77777778 0.4745763 0.1200000 3.184180e-16 8.082056
dinner=dinner 0.71428571 0.2542373 0.0700000 1.094845e-07 5.182468
Tea=Earl Grey 0.11917098 0.3898305 0.6433333 8.396333e-06 -4.303756
how=tea bag 0.09411765 0.2711864 0.5666667 3.162813e-07 -4.981001
dinner=Not.dinner 0.15770609 0.7457627 0.9300000 1.094845e-07 -5.182468
where=chain store 0.08854167 0.2881356 0.6400000 7.994965e-10 -6.034051
$quanti$2
v.test Mean in category Overall mean sd in category Overall sd
Dim.2 12.92675 0.5267543 6.280824e-17 0.3746555 0.3486355
32 / 32

Eda Mca

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Eda Mca

Enviado por

Direitos autorais:

Formatos disponíveis

Introduction MCA through the example Advanced comments Clustering

Multiple Correspondence Analysis

2(row prole i , row prole i

2(column prole j , column prole j

How to deal with rare levels?

Você também pode gostar