Names :- Gaurav Mohta (90) Priyanjali Moulik (91) Ameya Rege (99) Tanuj Nabar (93) Class & Batch :- PGDM 2 Date of submission :- Saturday, 16 Nov, 2013 Kunal Kashyap (81) Multiple Regression Theory Multiple regression is regression with two or more independent variables on the right-hand side of the equation. Use multiple regression if more than one cause is associated with the effect you wish to understand. For prediction: Multiple regression lets you use more than one factor to make a prediction. For explanation: Multiple regression lets you separate causal factors, analyzing each ones influence on what you are trying to explain. The equation and the true plane For the case of two independent variables, you can write the equation for a multiple regression model this way: Y = + X + Z + error Imagine that the X- and Z-axes are on a table in front of you, with the X-axis pointing to the right and the Z-axis pointing directly away from you. The Y-axis is standing vertically, straight up from the table. Y = + X + Z is the formula for a flat plane that is floating in the three-dimensional space above the table. is the height of the plane above the point on the table where X=0 and Z=0. is the slope of the plane in the X direction, how fast the plane rises as you go to the right. If is bigger than 0, the plane is tilted so that the part to your right is higher than the part to your left. is the slope of the plane in the Z direction, how fast the plane rises as it goes away from you. If is bigger than 0, the plain is tilted toward you. If is negative, the plane is tilted away from you. The error, in Y = + X + Z + error, means that the data points do not lie right on this plane. Instead, the data points form a cluster or a cloud above and below the plane described by Y = + X + Z. When we collect data, we do not get to see the true plane. All we have is the cloud of points, floating in space. Multiple regression with two independent variables tries to find the plane that best fits that cloud of points, in the hope that the regression plane will be close to the true plane.
Our regression model is aimed at finding out the dependence of the number of goals scored by A-listed football players by taking into account factors such as goals, matches, assists, yellow cards, red cards, substituted on , substituted off. The sample size was determined on the basis of certain criterias of the players which are as follows :- Now, here in our model, Independent variable (y) = Number of goals scored by the four players in different seasons Dependent variables (x1) = Number of matches played (x2) = Number of assists (x3) = Number of yellow cards they each got issued (x4) = Number of red cards they got issued (x5) = Number of times they each got substituted on (x6) = Number of times they each got substituted off
y = 0.72x1-0.604x2+0.33x3-4.62x4-2.435x5- 3.037x6+32.405
where, y = Goals scored
x1 = Number of matches
x2 = Number of assists by the player
x3 = Number of Yellow Cards given to the player
x4 = Number of Red Cards given to the player
x5 = Number of times the player is substituted on the team
x6 = Number of times the player is substituted off the team
For example, the number of goals that will be scored in a season by a player according to this model under the following conditions:
Number of matches =
30
Number of assists by player =
14
Number of Yellow Cards given to the player =
1
Number of Red Cards given to the player =
0
Number of times the player is substituted on the team =
3
Number of times the player is substituted off the team =
2
is given by:
y = 32.52550589
Approximately, 33
To check our model at 5% level of significance,
Null hypothesis: H0: All factors are zero
Alternate hypothesis: H1: All factors are not zero
F Method
F = 5.210223049
F crit = 3.37 {F(6,9) at 5% level of significance}
.: F > F crit
.: Reject H0
.: Conclude that all factors are not zero
Significance-F method
Significance-F = 0.014113619
As it is a very small value, reject H0
.: Conclude that all factors are not zero
t Distribution method
t-values = 1.088923443
-0.520578462
0.240927386
-0.577088367
-2.873058225
-1.576609914
t crit = 1.833 {t(9) at 5% level of significance} As t values are not greater than t crit, reject H0.
.: Conclude that all factors are not zero
Error method to find factors of dependence: for x1, i.e. number of matches played, p value =
0.30448508
.: Error percentage = 30.44850803 % for x2, i.e. number of assists by the player, p value = 0.615223742
.: Error percentage = 61.52237417 % for x3, i.e. number of yellow cards given to the player, p value = 0.815010301
.: Error percentage = 81.50103014 % for x4, i.e. number of red cards given to the player, p value = 0.578027352
.: Error percentage = 57.80273521 % for x5, i.e. number of times the player is substituted off the team, p value = 0.018384826
.: Error percentage = 1.838482588 % for x5, i.e. number of times the player is substituted off the team, p value = 0.149339776
.: Error percentage = 14.93397762 %
Conclusion:- If one more match is played, then the number of goals will go up by 7.2 i.e. 7 approximately. If there is one more assist by the players, then the number of goals goes down by 6. If there is one more yellow card issued, then number of goals go up by 3.3 i.e. 3. If there is one more red card issued, then number of goals goes down by 5. If there is one more player substituted on the team, then number of goals go down by 2.43 i.e. 2. If there is one more player substituted off the team, then number of goals go down by 3.7 i.e. 4.
Hence, we find that x2, x3 and x4 have a high error value when used as factors to determine the value of y Hence, the number of assists by the player and the number of yellow and red cards given to the player cannot be determining factors for numbe of goals scored by him in the season. .: The new equation of regression is: y = 0.72x1-2.435x5-3.037x6+32.405 .: The number of goals that will be scored in a season by a player according to this model under the following conditions: Number of matches = 30 Number of times the player is substituted on the team = 3 Number of times the player is substituted off the team = 2 is given by: y = 40.64899767 Approximately, 41