Você está na página 1de 10

MANAGERIAL DATA ANALYSIS

Case Study : Catering Business Student: Popescu Laura Elena

CASE STUDY: CATERING BUSINESS


A trader has a network of catering business units with 40 vendors employed. For the 40 vendors, considered as a general community, has made a survey done in alphabetical order, which is a criterion and were randomly given a code (no CRT.). We considered 2 variables: - Variable x : the number of worked hours Variable y : the amount of the monthly net salary

Table 1: Collected Data


Crt. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Age 19 20 22 21 26 24 23 25 27 21 23 23 20 24 21 25 22 26 23 20 19 24 20 25 21 26 22 27 Hours Worked 150 170 190 184 180 188 174 178 168 196 176 183 157 175 191 184 147 135 187 166 146 158 157 173 170 180 169 168 Net Monthly Salary ( lei ) 1313 1488 1663 1472 1575 1504 1392 1558 1470 1715 1540 1601 1374 1531 1671 1610 1286 1181 1636 1453 1278 1383 1374 1514 1488 1575 1479 1470

29 30 31 32 33 34 35 36 37 38 39 40

22 23 19 21 20 21 21 24 25 26 23 24

211 176 160 150 165 140 151 193 178 135 187 179

1846 1540 1400 1313 1444 1225 1321 1689 1558 1181 1636 1566

 The average, standard deviation and the coefficient of variation:


1. a. The average worked hours of the vendors:
n

x
x!
i !1

6825 ! 166.15 40

b. The average monthly net salary:


n

y
y!
i !1

59313 ! 73,725 lei 40

2. a. The standard deviation for the number of worked hours

W=

x

! 17,51
2

This result tells us that the between the real and estimated number of worked hours and the amount of salary there is a difference of , plus/minus, 17,51 hours b. The standard deviation for the amount of the vendors monthly net salary

W=

y

! 150,52
2

This result tells us that the between the estimated and the real amount of monthly salary and the number of worked hours there is a difference of , plus/minus, 150,52 hours 3. a. The coefficient of variance of the worked hours Cv ! W 100 ! 101,68% x

b. The coefficient of variance of the monthly net salary

Cv !

W 100 ! 100,35% y

Because the level of the coefficient of variance is over the level of 35% we can conclude that the average worked hours and the average monthly net salary is not representative and the data is not heterogeneous. Table 2: The Average, Standard Error and Standard Deviation

Minimum Maximum Max-Min No. of Classes Range

Worked Hours 135 211 76 5 15.2

Salary 1181 1846 665 5 133

 Frequency distribution
The 40 vendors will be grouped in 5 intervals with the calculated range of 15.2 for the variable x and of 133 for the variable y.

LCL 135 150.2 165.4 180.6 195.8

UCL 150.2 165.4 180.6 195.8 211

7 6 16 9 2

142.6 157.8 173 188.2 203.4

LCL 1181 1314 1447 1580 1713

UCL 1314 1447 1580 1713 1846

7 7 17 7 2

1247.5 1380.5 1513.5 1646.5 1779.5

1. a. The arithmetic mean of worked hours of the n=40 vendors: x!

x  f n
i i

! 170,34

b. The arithmetic mean of the amount of the monthly net salary of the n=40 vendors: y!

y  f n
i i

! 3289,44

2. a. The standard deviation for variable x:


2 W x ! W x ! 13,5

This result tells us that between the real and estimated number of worked hours and the amount of salary there is a difference of, plus/minus, 13, 05 hours. b. The standard deviation for variable y:
2 W y ! W y ! 57,35

This result tells us that the between the estimated and the real amount of monthly salary and the number of worked hours there is a difference of , plus/minus, 57,35 hours 3. a. For variable x, the number of worked hours the coefficient of variance is : Cv ! W x  100 ! 7,66%

b. For the y variable, for the amount of the monthly net salary the coefficient of variance is: Cv ! W y  100 ! 1,74%

Because the coefficient of variance of the two variables is below 35% it results that the grouping of x and y is eloquent. Table 3: The frequency distributions

18 16 14 Frequency 12 10 8 6 4 2 0

Frequency Distribution Of Worked Hours

135-135

150.2-150.2

165.4-165.4 Range Intervals

180.6-180.6

195.8-195.8

20

Frequency Distribution Of The Monthly Net Salary

15 Frequency

10

0 1181-1181 1314-1314 1447-1447 Range Intervals 1580-1580 1713-1713

This charts give us useful information about the shape of the distribution and as we can see above for both of the variables the highest number of data is found the third interval.

 The Empirical Rule


The empirical rule Interval x s1 W x s2 W Worked Hours Worked Hours Work Hours (lower) Worked Hours (upper) frequency 153.1179501 188.1320499 135.6109003 205.6390997 % 35 39 87.5 97.5

The empirical rule Interval y s 1W Salary (lower) 1332.300977 1181.776955 .

Monthly Net Salary Salary (upper) 1633.349023 1783.873045

Salary frequency 33 39

% 82.5 97.5

y s 2W

The empirical rule states that for a normal distribution:


y y y

68% of the data will fall within 1 standard deviation of the mean 95% of the data will fall within 2 standard deviations of the mean Almost all (99.7%) of the data will fall within 3 standard deviations of the mean

 Regression
Relationship Betweem Worked Hours And The Monthly Net Salary
2000 1500 Salary 1000 500 0 0 50 100 Worked Hours 150 200 250 y = 8.3514x + 57.868 R = 0.9435

From this scatter diagram we can see that the model is almost a perfect line, a linear model. This means that the two variables are interdependent and for example the amount of the monthly net salary depends on the number of worked hours in the company.

 The Analysis of variance ( ANOVA )


Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.971328786 0.94347961 0.941992231 36.25341277 40

Because Multiple R = 0,97 we can draw the conclusion that the link between the amount of the vendors monthly salary and the number of hours they work are linked together. The Standard Error reflects the difference between the estimations made about that there is an average difference of 36,25 between the two chosen variables.
ANOVA Regression Residual Total SS MS F Significance F 1 833697.9974 833697.9974 634.3237 2.58866E-25 38 49943.77762 1314.309937 39 883641.775 Standard Coefficients t Stat P-value Error 57.868 56.86746539 1.017593443 0.315307 8.3514 0.331591679 25.18578452 2.59E-25 df

Intercept Hours Worked

Lower 95% -57.25420516 7.680124313

Upper 95% 172.9901249 9.022668832

Lower 95.0% -57.25420516 7.680124313

Upper 95.0% 172.9901249 9.022668832

From the table above, we can see that the intercept variable is 57,868 and it mean that when the explanatory level is 0, when the number of worked hours is 0, the amount of the monthly net salary will be of 57,868 lei. Because the value of P-value is high 0,31 the coefficient is insignificant. The coefficient b = 8,35 and it means that if the level of worked hours will increase also the salary will increase with 8,35 lei.

I Coefficient of correlation test. 1. Formulation of hypotheses: H0: r = 0 H1: r { 0 2. We choose a significance level of 95% ( = 0.05); 3. Because the number of observations (40) is higher than 30 we will choose Student test (z) with n-2 degrees of freedom. 4. z tab ! z E / 2;n  2 ! z 0, 025; 25 ! 1,96 n2 38 ! 0.97 ! 24,41 2 0.06 1 r

z calc ! r 5.

6. The critical areas: | z calc |> | z tab |

7. Since z calc > z tab for a significance level

= 0.05 we reject the null hypothesis and

accept the alternative hypothesis, therefore the model is valid II. Testing the significance of the a parameter:
1. Formulation of hypotheses: H0: = 0 H1: {0

2. We choose a significance level of 95% ( = 0.05); 3. Because the number of observations is higher than 30 we will choose Student test (z) with n-2 degrees of freedom ( in our case 38 d.f)

4. z tab ! z E / 2;n  2 ! z 0, 025; 25 ! 2,38 z calc ! a E a  0 ! ! 193,53 sa sa

5.

Where Sa is the standard error of the parameter a:

Sa !

SSE n2

(x

 i

 x)2

28564,76 33,80 25 ! ! ! 0,299 12754,4 112,93

6. The critical areas: | z calc |> | z tab |

7. Since z calc > z tab , for a significance level of accept alternative hypothesis. III. Testing the significance of the b parameter: 1. Formulation of hypotheses: H0: = 0 H1: { 0

= 0.05 we reject the null hypothesis, so we

2. We choose a significance level of 95% ( = 0.05);

3. Because the number of observations is higher than 30 we will choose Student test (z) with n-2 degrees of freedom ( in our case 38 d.f); 4. z tab ! z E / 2;n  2 ! z 0, 025; 25 ! 2,38 5. z calc ! b ! 27,93 ( extracted from ANOVA z Stat) Sb

Where Sb is the standard error of the parameter b:

Sb !

SSE n2

(x
6.

 i

! 0.299
2

 x)

The critical areas: | z calc |> | z tab |

7. Since z calc > z tab , for a significance level of accept alternative hypothesis.

= 0.05 we reject the null hypothesis, so we