Você está na página 1de 11

Empirical Project: Mets Attendance (1986-2011)

Kevin Mulcahey ECO-231 Dr. Letcher The College of New Jersey


Statement of the Problem This empirical study seeks to discover the explanatory variables that best reflect the

stadium attendance of the New York Mets from 1986 to 2011. The explanatory variables that I have decided to study are the years since last playoff appearance, payroll (adjusted for inflation), number of all-stars, winning percentage, and average batters age. In order to discover the relationship between these explanatory variables and the dependent variable of attendance, multiple regression, residual plots, normal probability plots, and various other statistical analyses will be employed. II. Review of Literature Related to the Variables Before beginning the study, statistical journals and analyses regarding Major League Baseball attendance, payroll, winning percentages, and other variables will be consulted. The first journal, written by Don N. Macdonald and Morgan O. Reynolds of Texas A&M University, analyzes the relationship between players and their marginal product by using many of the same explanatory variables from my empirical study. Marginal product is defined as the amount of total revenue earned by the company by hiring one extra unit of labor. Research from the seasons of 1986 and 1987 in Major League Baseball prove that players are paid for what they earn for their respective baseball organizations in sales. The reason for payroll correlating directly to revenue for MLB organizations has a lot to do with the institutions of free agency and final offer arbitration. Free agency allows players to test the market and seek the best offer for their abilities among all major league teams. Arbitration allows for pay increases during the season, based upon performance. These two contractual outlets for players allow their marginal revenue product to correlate more directly with ticket sales, attendance, and

team revenue. These findings relate to my data very closely, as my dependent variable is attendance, and my explanatory variable with the lowest p-value is payroll (adjusted for inflation). My data also begins in 1986, just like this study performed by MacDonald and Reynolds. The allowance for arbitration officially began in 1970, when a second MLB collective bargaining agreement allowed for an impartial arbitrator in settling player contract disagreements, rather than the commissioner of baseball. In the season of 1985, arbitrators discovered that owners across Major League Baseball were colluding to keep baseball player salaries artificially low, thus reducing competitive bidding. For their collusion, owners were fined $280 million dollars in damages, and baseball team payrolls have steadily increased each season. The journal article by Macdonald and Reynolds also relates to another one of my variables. Winning percentage, they say, is not as important of a significant predictor of attendance when compared with statistics that forecast a teams success. For example, if a teams winning percentage increases from .500 to .550, the increase from .550 to .600 will not make a noticeable impact on stadium attendance. This is due to the fact that fans view entertainment on an, ex ante basis rather than an, ex post, basis. In other words, the forecasting of a teams success is more conducive of sales and attendance than post performance. People are more likely to buy more tickets when they expect a team to perform well, rather than once they are already doing well. In relating this idea to my variables, number of All-Stars would be a more significant predictor of attendance than winning percentage. Payroll and All-Stars are similarly related in that the more popular and high-quality a roster is, the more attendance will increase in a given season.

Another statistical journal, written by Michael C. Davis of Missouri-Rolla University analyzes the interaction between baseball attendance and winning percentage. According to Davis, the interaction between baseball attendance and winning may not be completely obvious. It is expected that as a team performs well, the organizations, bandwagon effect, will come to fruition and a team should, therefore expect to see an increase in attendance during and following seasons in which the team played well on the field. This journal article implies that although winning percentage affects attendance directly because winning has become increasingly important to fans in recent years, attendance also could affect winning percentage. When a team is successful and generates superior attendance and revenue, winning percentage should rise. Successful organizations have more room in their budget to attain high quality players. In my regression, I chose to place payroll and winning percentage as the explanatory variables, and attendance as the dependent variable. Interestingly enough, according to the article, only about half of the National League teams in the MLB had up-ticks in attendance. This would indicate that winning percentage as a significant predictor of attendance varies by team. Also, as far as the American League, some teams such as the Yankees actually showed a negative shock response to winning, as it may be possible that fans have almost become indifferent to the teams consistent winning nature. However, in the conclusion of the study, it showed that in the long-term, all ten of the sampled teams (Cubs, Reds, Yankees, White Sox, Phillies, Pirates, Indians, Tigers, Cardinals, Red Sox) showed positive attendance growth in regard to winning percentage. Also, by the conclusion of the study, the data showed that only one team, the Indians, had a positive effect on winning percentage from attendance. This would indicate that my chosen dependent variable, attendance,

is the best choice between the two. Winning percentage is a better explanatory variable in Major League Baseball. A final statistical analysis, conducted by market research analyst David P. Kronheim, takes into account another variable that affected New York Met attendance in the past 3 years. This journal discusses the effect of the stadium, which can have an effect when taken into account with the numerical data I employed in my analysis. Kronheim raises the point that when the Mets moved to their new stadium in 2009, Citi Field, attendance was going to decrease regardless of performance. The total amount of seats in Shea Stadium was 57,365, whereas Citi Field has only 41,800 seats. From 2005 to 2007, The Mets had gains in attendance of more than 470,000 per year, which left them within the top 2 teams in the National League in regard to attendance increases. The Mets were very competitive during these last few years at Shea Stadium. Kronheim notes that, If the Mets had sold every single ticket possible in 2009, including player and comp tickets, their attendance still would have fallen by 656,243. However, the Mets still had quality attendance in 2009 at Citi Field, as 3,168,571 spectators attended. The huge drop off from 2009 to 2010 of 576,166 (an 18.4% decline), has to do with the lesser amount of seats, as well as other statistics. The Mets fell below a .500 winning percentage again in 2009-2011 and had less all stars. In fact, the smallest attendance at any Mets home game in 2008 was 45,321, which is more than 3,500 higher than Citi Fields capacity. The Mets were playoff contenders in their last year at Shea Stadium, although they missed out on the playoffs on the last game of the season. Attendance that year was 4,042,045. I decided not to include the type of stadium in my personal regression analysis, because the data dates back to 1986, and the Mets have only been at Citi Field since 2009. The majority of the analysis comes

from Shea Stadium from 1986-2008, and the new stadium statistics would only appear to be outliers. However, I wanted to include this market research journal in my report because it could partially explain the drastic drop in the most recent data I have compiled (2009-2011). III. Data Sources and Descriptions In compiling the information for my Mets data set from 1986 to 2011, I used two main sources. For the dependent variable of attendance, as well as the explanatory variables of winning percentage, years since last playoff appearance, payroll, and average batters age, I used Baseball-Reference.com. In order to discover the number of all-stars per year for the New York Mets, I used Mets.com. I also decided to adjust the payroll for each year from 1986 to 2010 for inflation in order to have the most accurate comparison possible. The inflation calculator on bls.gov aided me in this process. As mentioned earlier, my data organizes the effect of years since last playoff appearance, payroll (adjusted for inflation), number of all-stars, winning percentage, and average batters age on stadium attendance for the New York Mets from 1986 to 2011 (Figure 1). For the first few years of the data, namely 1986 to 1990, the Mets were very successful. After making the playoffs in 1985, 1986, and 1988, and winning the World Series in 1986, total season stadium attendance ranged from 2.7 million to 3 million. In these 5 years, The Mets had high winning percentages ranging from .537 to .667, and a total of 19 all-stars, which is an extremely high amount. In direct contrast to the years of 1986-1990, the Mets performed horribly from the years of 1991 to 1998. The average batters age during these years was much younger than during successful years (27 as opposed to 30 in 2000 when they made it to the World Series), winning percentages were in the dismal range of .364 to .478, and payroll was much lower, highlighted

by the 35,015,247.14 team payroll in 1996. Attendance in these years was very low, rarely breaking 2 million. The Mets performed poorly again from 2001 to 2005, performed well from 2006 to 2008, and are performed poorly from 2009 to 2011. These three Mets eras indicate fluctuations in the dependent variable of attendance with most of the explanatory variables. IV. Regression and Analysis I have identified my explanatory variables, or X-variables, in this study as years since last playoff appearance (YSLPA), payroll adjusted for inflation (Payroll), winning percentage (Win%), number of all-stars (All-stars), and average batters age (Avg Batt. Age). My dependent variable, or Y-variables, is attendance (Attendance). My data set is made up entirely of numeric explanatory variables. First, individual scatter plots of each explanatory variable were created. The scatter plot of the X-variable YSPLA against the Y-variable attendance is shown below.

YSLPA v. Attendance (Figure 2)

4500000 4000000 3500000 3000000 2500000 Attendance 2000000 1500000 1000000 500000 0 0 2

y = 12415x3 - 159530x2 + 290507x + 3E+06 R = 0.6341



Years Since Last Playoff Appearance

The scatter plot of YSPLA against Attendance indicates that as the amount of years since the Mets have reached the playoffs increases, the attendance decreases. Originally, I tried a linear trend line to fit the data. The linear line fit the data relatively well, aside from three data points

from years 8 through 10. The R-square for the linear line, was only .4, however, and I opted to try a quadratic or cubic equation to reflect the curvature of the data in years 8, 9, and 10. The Rsquare improved dramatically from .4 to .634. Although it would appear that there should be a direct, negatively linear line that fits the data, there could be an explanation that explains the curvature. In years 8, 9, and 10 of missed playoff berths, according to the data set, the Mets were starting to come out of their decade-long slump and becoming possible playoff contenders. It is possible that the fans, in anticipation of the improved play of the team, started to attend more games. Payroll, my second X-variable, against the Y-variable, has a scatter plot that also reveals some curvature. By using the R-square as a measure of fit again, I decided that a linear fit was not appropriate. A linear equation had an R-square of .1, while the 4-order quartic equation

Payroll v. Attendance (Figure 3)

4500000 4000000 3500000 3000000 Attendance 2500000 2000000 1500000 1000000 500000 0 0

y = 2E-25x4 - 7E-17x3 + 1E-08x2 - 0.5384x + 1E+07 R = 0.4812

50,000,000 100,000,000 Payroll 150,000,000 200,000,000

had an R-square of .48. In choosing a 4-order equation, I took into account the R-squares of quadratic and cubic equations and decided that parsimony did not apply. With each increase of higher orders, I received improved R-squares ranging from .08 to .10. Therefore, the increases in fit were not minimal enough to simply leave the equation alone as a quadratic or cubic equation.

For my third X-variable, All-stars, I analyzed the coefficient of determination, the Rsquare, once again. Although a linear equation was stronger than payroll, at .397, I still opted with a cubic polynomial equation. The R-square for the fit of this equation was .419

All-stars v. Attendance (Figure 4)

4500000 4000000 3500000 3000000 2500000 Attendance 2000000 1500000 1000000 500000 0 0 1 2

y = -41835x3 + 270645x2 - 49072x + 2E+06 R = 0.4192

3 Number of All-stars 4 5 6

Clearly, as the more talented players a team acquires increases, attendance increases. However, there is still some curvature. A possible explanation for this curvature could be that as a team has 1, 2, 3 all-stars, the teams prospects for attendance rises dramatically, but the excitement fans have for all-stars 4, 5 and 6, increases at a slower rate. While the attendance rates are still higher, this may suggest that a team only performs marginally better with more than 3 or 4 all-stars. The fourth X-variable, Win%, has somewhat of a sporadic scatter plot. The goodness of fit, regardless of the type of equation, seems to be relatively low. The highest R-square I was able to attain was .284 with a cubic equation. The curvature indicates that there is low attendance from winning percentages of .35 to .45 with little increases. This may suggest that even though it is a much higher winning percentages, fans still do not wish to attend games because the team is not competitive in the championship season. There are, however, dramatic increases from


winning percentages of .45 to .55. With these percentages, the Mets have a chance at playoff aspirations.

Win% v. Attendance (Figure 5)

5000000 y = -2E+08x3 + 4E+08x2 - 2E+08x + 3E+07 R = 0.2839 4000000 3000000 Attendance 2000000 1000000 0 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Winning Percentage

There are still modest increases in attendance from .55 to .6, before it levels off. The next pattern in the curvature reflects that attendance actually decreases at the winning percentage of .667, but this could reflect an outlier, because the majority of the data has already leveled off from .6 to .65. My final X-variable is the Mets average batters age, Avg. Batt. Age, for each season from 1986 to 2011.

Avg. Batt. Age V. Attendance (Figure 7)


y = -93161x4 + 1E+07x3 - 5E+08x2 + 1E+10x - 7E+10 4000000 R = 0.3337

Attendance 3000000 2000000 1000000 0 27 27.5 28 28.5 29 29.5 30 30.5 31 Average Batter's Age


For this fifth variable, a polynomial equation was appropriate once again. A quartic equation appeared to have the highest R-square, with a value of .334. A linear equation would not have been as appropriate, because it appears that attendance rises from the average batters ages of 27.5 to 28, then levels off from 28.5 to 29.5, and rises dramatically from the age of 30 on. A possible explanation for this curvature could be that an older lineup may have more experience, reflect better performance, and thus affect attendance. This variable, however, proved to be insignificant toward attendance, as shown by the multiple regression performed in the subsequent portion of this study. Additional statistical analysis, aside from scatter plots and goodness of fit, is required in order to discover significant predictors of attendance. A multiple regression including each explanatory variable against the dependent variable of attendance indicated that a form of variable selection was necessary (Figure 8). First, a global F-test was run against each of the variables to decide whether any of them were significant predictors. The hypothesis test for the global F-test is as follows: H0: B1+B2+B3+B4+B5= 0 Ha: At Least One of the Betas 0 As far as the alpha level for this hypothesis test, I decided to use an alpha of .15. My reasoning is that when studies of social sciences are conducted with human beings, there is more variation. Therefore, I do not want to reject any variables that could be found significant. The global F-test showed a very low P-value of .00002. According to the P-value, I chose to reject the null hypothesis, and concluded that at least one of the explanatory variables was significant.