Você está na página 1de 9

If there are images in this attachment, they will not be displayed.

MULTIPLE REGRESSION

In multiple regression, we work with one dependent variable and many independent variables - The purpose of multiple regression is to analyze the relationship between non-metric/metric independent variables and a metric dependent variable. - If there is a relationship, using the information in the independent variables will improve our accuracy in predicting values for the dependent variable. - For example, you could do a multiple regression looking at the relationship between weight (the dependent variable) and height, age and sex (the independent variables) Scalling: Dependent variable-Interval or ratio scale(Metric).Independent variable-Nominal or ordinal scale(Non-metric,Metric) Types: There are two types of multiple regression, - Forward selection: Start by choosing the independent variable which explains the most variation in the dependent variable. Choose a second variable which explains the most residual variation, and then recalculate regression coefficients. Continue until no variables "significantly" explain residual variation. - Backward elimination: Start with all the variables in the model, and drop the least "significant", one at a time, until you are left with only "significant" variables Formula: The model for a multiple regression takes the form, Y=a+bX1+cX2+dX3+eX4 Where, a,b,c,d,e-regression coefficients

Multiple regression characteristics: - There should be one dependent variable and many independent variables - The independent variables may be correlated - Independent variables can be continuous

Steps for Test of Significance to be used in SPSS tables: 1.Using F-test/ Anova to test for overall significance when each <0.05 2.Regression coefficients t-test to test whether each independent variable is significant when <0.05 A t-test is used to evaluate the individual relationship between each independent variable and the dependent variable 3.Providing (beta)values and a constant 4. Although the independent variables can be correlated, there must be no perfect (or near-perfect) correlations among them, a situation called multicollinearity. If the Variable Inflation Factor(VIF)>7 then eliminate those independent variables from the regression. Merits and Demerits: Level of familiarity: Multiple regression is one of the most commonly used statistical techniques, and many people are familiar with it, at least in outline. This will be especially true of people educated in the social, behavioral or physical sciences; for this audience, familiarity is an advantage. On the other hand, if your audience is the general population, then many people will be unfamiliar with multiple regression; for this audience, familiarity is a disadvantage, and you might want to use a simpler statistic or rely entirely on graphs. Error assumptions: Multiple regression makes four assumptions, The assumptions are about the errors from the model; the errors are the difference between the predicted value of the dependent variable and the actual value of the dependent variable. Multiple regression assumes that the errors from the model are normally distributed; that the errors have

constant variance; that the mean of the errors is zero; and that the errors are independent Flexibility: The independent variables can be numeric or categorical, and interactions between variables can be incorporated; and polynomial terms can also be included. For example, in examining the relationship between weight and height, age and sex, you could include height squared and the product of height and sex. Then the relationship between height and weight would be different for men and women, and the predicted difference in weight between a 5-foot-tall person and a 5-foot-1 person is not the same as that between a 6-foot-tall person and a 6-foot-1 person. Use of Multiple Variables:Multiple regression uses multiple independent variables, with each controlling for the others. For example, in the model of weight as related to height, age and sex, the model estimates the effect of height controlling for sex. The parameter for height answers the question "What is the relationship between height and weight, given that a person is male or female and of a certain age?"

If there are images in this attachment, they will not be displayed.

Bivariate Correlation Analysis: Bivariate correlation analysis differs from nonparametric measures of association and regression analysis in two important ways. First, parametric correlation requires two continuous variables measured on an interval or ratio scale. Second, the coefficient does not distinguish between independent and dependent variables. It treats the variables symmetrically since the coefficient has the same interpretation as. Pearsons Product Moment Coefficient r: The Pearson (product moment) correlation coefficient varies over a range of +1 through 0 to -1.the designation r symbolizes the coefficient estimate of linear association based on sampling data. The coefficient represents the population correlation.

Correlation coefficients reveal the magnitude and direction of relationship. The magnitude is the degree to which variables move in unison or opposition. The size of correlation of +.40 is the same as one of -.40.the sign says nothing about the size. The degree of correlation is modest. The coefficients sign signifies the direction of the relationship. Direction tells us whether large values on one variable are associated with large values on the other (and small values with small values). When the value corresponds in this way, the two variables have a positive relationship: As one increases , the other also increases. Family income, for example, is positively related to household food expenditures. As income increases, the food expenditure increases. Other variables are inversely related. The Assumption of r : Like other parametric techniques, correlation analysis makes certain assumptions about the data. Many of these assumptions are necessary to test hypothesis about the coefficient. The first requirement for r is linearity. The second assumption for correlation is a bivariable normal distribution-(i.e.) the data are from a random sample of a population where the two variables are normally distributed in a joint manner. Often these assumptions or the required measurement level cannot be met. Then the analyst should select a non linear or non parametric measure of association.

Computation And Testing of r : The formula for calculating Pearsons r is: r= (1)

Where, n = the number of pairs of cases = the standard deviations for X and Y Alternatively, r= (2)

Simple Linear Regression:

When we take the observed values of X to estimate or predict corresponding Y values, the process is called simple prediction. When more than one X variable is used, the outcome is a function of multiple predictors. Simple and multiple predictors are made with a technique called regression analysis. The Basic Model: A straight line is the best way to model the relationship between two continuous variables. The bivariate linear aggression may be expressed as Y= Where the value of the dependent variable Y is a linear function of the corresponding value of the independent variable in the ith observation. The slope and the Y intercept are known as regression coefficients. The slope, , is the change in Y for a 1-unit change in X. It is sometimes called the rise over sun. This is defined by the formula

This is the ratio of change ( in the rise of the line relative to the run or travel along the X axis. Exhibit 19-10 shows a few of the many possible slopes you may encounter. The intercept,, is the value for the linear function when it crosess the Yaxis; it is the estimate of Y when X=0. A formula for the intercept based on the mean scores of the X and Y variables is

Residuals: A residual is what remains after the line is fit or (Y-.When standardized residuals are comparable to Z scores with a mean of 0 and a standard deviation of1. In this plot, the standardized residuals should fall between 2 and -2, be randomly distributed about zero, and show no discernible pattern. All these conditions say the model is applied appropriately. In our example, we have one residual at-2.2, a random distribution about zero, and few indications of a sequential [pattern. It is important to

apply other diagonistic to verify that the regression assumptions (normality, linearity, equality of variance, and independence of error) are met. Various software programs provide plots and other check of regression assumptions. Predictions: If we wanted to predict the price of a case of investment-grade red wine for growing season that averages 21, our prediction would be

This is a point prediction of Y and should be corrected for greater precision. As with other confidence estimates, we establish the degree of confidence desired and substitute into the formula

Where, = the two tailed critical value for t at the desired level (95% in this example) s = the standard error of estimate (also the of the mean square error from the analysis of variance of the regression model) S= the sum of the squares for X 3899.67 (2.306) (538.559) 3899.67 1308.29 We are 95% confident of our prediction that a case of investment quality red wine grown in a particular year at 21 average temperatures will be initially priced at 3899.671309.29 French francs(FF) , or from approx. 2591 to 5208FF. The comparatively large band width results from the amount of error in model (reflected by), some peculiarities in Y value, and the use of single predictor. It is more likely that we would want to predict the average price in all cases grown at 21. This prediction would use the same basic formula but omitting the first digit (1) under the radical. A narrower confidence band is the result since the average of all Y value is being predicted from given X. In our example the confidence interval 95% is 3899.67411.42 or from 3488 to 4311FF.

The predictor we selected, 21 was close to the mean of X (19.61). Because the prediction and confidence bands are shaped like a bow tie, predictors farther from the mean have a larger bandwidths. For example value of 15, 20 and 25 produce confidence bands of 565,397 and 617, respectively. Testing the Goodness of Fit: With the regression line plotted and a few illustrative predictions, we should now gather some evidence of goodness of fit how well the model fits the data. The most important test in bivariate linear regression is whether the slope , is equal to zero. We have already observed a slope of zero, line b. Zero slopes result from various conditions. - Y is completely unrelated to X, and no systematic pattern is evident. - There are constant values of Y for every value of X. - The data are related but represented by a nonlinear function. The t-test To test whether =0 we use two tailed test (since the actual relationship is positive, negative, or zero ).The test follows the t distribution for n-2 degrees of freedom:

Where, was previously defined as slope . is the standard error of We reject the null hypothesis, =0,because the calculated t is greater than any t value for 8deg of freedom and =.01. Therefore, we could conclude that the slope is not equal to zero.

If there are images in this attachment, they will not be displayed.

MULTI DIMENSIONAL SCALING

Multi Dimensional Scaling (MDS) allows a researcher to measure an item in more than one dimension at a time. The basic assumption is that a class of procedures for representing perceptions and preferences of respondents

spatially by means of visual display. Perceived or psychological relationships among stimuli are represented as geometric representations are often called spatial maps. The axes of the spatial map are assumed to denote the psychological bases or underlying dimensions respondents use to form perceptions ad preferences for stimuli. APPROACHES TO MDS 1. Metric approach 2. Non Metric approach The metric approaches to MDS treats the input data as interval scale data and solves applying statistical methods for converting interval scale to ratio scale which minimizes the dimensionality of the solution space. The non metric approach first gathers the non metric similarities by asking respondent to rank order all possible pair that can be obtained from a set of objects. Such non metric data is then transformed into some arbitrary metric space and then the solution is obtained by reducing the dimensionality.

CANONICAL ANALYSIS This analysis can be used incase of measurable and non-measurable variables for the purpose of simultaneously predicting a set of dependent variables from their joint covariance with a set of independent variables. Both metric and non-metric data can be used in the context of this multivariate technique. The procedure is to followed is to obtain a set of weights for the dependent independent variables in such a way that linear composite of the criterion variables has a maximum correlation with the linear composite of the explanatory variables The main objective of canonical correlation analysis is to discover factors separately in the two sets of variables such that the multiple correlations between sets of factors will be the maximum possible. Mathematically, the weight of two sets y= a1*y1 + a2*y2 + ... + ap*yp x=b1*x1 + b2*x2 + ... + bq*xq Have common variance. The resulting canonical correlation solution then gives an overall description of the presence or absence of a relationship

between the two sets of variables.

Você também pode gostar