Você está na página 1de 7

Will Stevens Dr. Kiker AP Stats 7th 9/23/13 The Linear Regression Project: Oil Exportation vs.

Gross Domestic Product It has been claimed that World War 3 will be fought over oil. The worlds dwindling supply combined with humans extreme dependence on the resource is a recipe for disaster. For my linear regression project, I researched further into that dilemma by investigating oil deportation amounts per country, specifically oil deportation to the United States because that pertains to me more. I then cross-referenced the top 15 oil supplier countries and their Gross Domestic Products to see if my hypothesized trend was correct. I suspected that the more oil a country sold to the United States, the higher their GDP would be, because oil is such a crucial part of economies. This analysis pertains to my life because I have to fill up my car every week or two and seeing the prices fluctuate makes me wonder which countries Im getting my oil and gas from. The data correlation is not what I had expected it to be, but it was interesting none the less! The scatterplot itself does not give the conclusive results I was predicting. The explanatory variable was Oil Exportation in barrels per day. This is because one would generally assume that sales indicate income, not the other way around. The response variable was the

Gross Domestic Product of the corresponding country, for the same reason that Oil Production was the explanatory variable. There are no outliers in this data set. An outlier is defined as any point above the mean plus two times the standard

2,500 2,000 1,500 1,000 500 0 0 1000 2000 3000

deviation or below the mean minus two times the standard deviation, on both axes. For this data set, that would include points greater than 1822.28 or less than -683.76 on the x-axis and points greater than 2193.47 or less than 895.57 on the y-axis. There are points that
2,000

would count as outliers on one of the axes, but not both. There were two possibly influential points in the data set. Points (163, 2253) and (275, 2015), when removed from

1,500 1,000 500 0 0 500 1000 1500 2000 2500

the data set, drastically alter the data set calculations. The slope increased by .2591, the yintercept decreased by 415.176, the r^2 value increased by .707 and the R-value increased by .5434. The Least Squares Regression Line for the original data is y^=382.583+ .4679x. The yintercept is 382.583 which means when x is 0, the y-value is predicted to be 382.583 billion dollars. The slope is .4679 which indicates that for every increase in one barrel per day, the GDP increases by .4679 billion dollars. The R-value, which is also called the Correlation Coefficient, determines how correlated, or tightly grouped, the data points are. The R-value for these data is .3796, which is generally considered to be a pretty weak correlation. This means that the

barrels of oil per day are a weak indicator of a countrys GDP. The R^2 value, or Coefficient of Determination, is .144 for these data. This means that 14.4% of the variation in GDP can be explained by the regression line of the x-axis, or barrels of oil per day. However, with the influential points removed, the correlation is significantly higher. The Least Squares Regression Equation is y^=-32.599+ .727x, the R-value is .923, and the R^2 value is .851. This indicates that the data without the influential points is strongly correlated and a high percentage of the variation in GDP can be explained by the barrels of oil per day. The residual plot of a set of data displays the graphed model of the residual values for all the data points compared to the line of best fit, or LSRL. The residual plot for the original data in this equation was slightly concerning, as it showed the two influential points very much separated from the rest of the values, which were generally clumped below the residual value of 0 line. This does not necessarily indicate that another shape would fit the data set better, but that linear is a tough shape to fit into the data. I also ran a residual plot without the influential points, however, and my findings were much more reassuring. The graph was quite random, with few data point clumps, but no distinct patterns. This means that without the influential points, linear is probably the most fitting shape for these data. I also picked an x-value (403) from the middle of the data set and calculated its residual. By substituting the value in for x in the LSRL equation, I was about to calculate the predicted value (571.872). I subtracted that value from the actual value (210.3) and found that the residual for this point was -361.572. That is a pretty large residual to have, so I calculated the same numbers using the LSRL of the data set without the influential points. This time I got a residual value of -50.09. Both predicted values were overestimations, though the one without the influential points was much more

accurate. A linear regression is the best fit because the R^2 value and residual plots for the data set without the influential points clearly indicates that linear is the most appropriate, and adding two values, while they mess with some of the numbers, wouldnt affect the data set enough to make another type of regression more appropriate. A career that would use this type of data could be anyone that works for oil companies and has to predict the economic status of countries that sell them oil, or anyone interested in tracing patterns amongst the top oil-exporting countries. That would be a very boring job, but someone may have to do it. That career would be relevant in our global society because their findings may tie in somehow with the gas prices here in the United States, which affects all of us. Essentially, the data can be looked at two ways. The original data is pretty much inconclusive and not very helpful, seeing as only 14% of the change in y can be explained by the linear regression of x. We could safely conclude that the number of barrels of oil exported to the U.S. per day does not cause the changes in the GDP of the corresponding countries. However, if the two influential points were removed, it becomes a different story. When 85% of the change in GDP is affected by barrels of oil exported per day, we can make a strong case that while barrels of oil exported may not cause GDP to fluctuate, it is a very prominent factor. Overall, barrels of oil exported to the United States each day may not always influence the GDP of that country, but often it is an influential factor.

Works Cited "Petroleum and Other Imports." Company Level Imports. U.S. Energy Information Administration - EIA Independent Statistics and Analysis, 27 Sept. 2013. Web. 02 Oct. 2013. <http://www.eia.gov/petroleum/imports/companylevel/>. "GDP (current US$)." Data- By Country. The World Bank, n.d. Web. 02 Oct. 2013. <http://data.worldbank.org/indicator/NY.GDP.MKTP.CD>.

Você também pode gostar