Você está na página 1de 24

An Examination of Pairwise Comparisons of College Open Ultimate Frisbee Teams

Michael Silger 11/28/2012


Under the direction of Dr. Christopher Wikle Department of Statistics University of Missouri 146 Middlebush Hall, Columbia, MO 65211, United States

Contents Abstract Acknowledgements 1 Introduction


1.1 The importance of an accurate ranking system 1.2 The current ranking method 1.3 Scrutiny of the current ranking method 1.4 An alternative approach to ranking frisbee teams

2 3 4 5
5 6 7 10

2 3

Data Methodology
3.1 Explanation of Elo Algorithm 3.2 Changing the Update Parameter to a Function 3.3 Explanation of Assumptions for Proposed Algorithm

11 12
12 14 16

4 5 6

Results Analysis Discussion

18 19 21 24

References

Abstract
The merit of pairwise comparisons in sports is a subject of much debate today, particularly in the ultimate frisbee community. The goal of this project is to study the current methods employed by USA Ultimate (USAU), the governing body of college ultimate frisbee, to see if a more robust and efficient approach to pairwise comparisons can be discerned. In the search for a more effective ranking scheme, I applied an adapted approach to Arpad Elos chess ranking algorithm, which yielded results similar to USAUs official rankings with a Spearman rank correlation of 0.974. My method provides a more statistically sound approach for ranking teams based on conventional principles established in the sport of ultimate frisbee.

Acknowledgements
I would first and foremost like to thank Dr. Christopher Wikle for assisting me on my endeavor to explore ranking algorithms. He has been pivotal to the success of this project as well as several statistical interests in my undergraduate and graduate career at the University of Missouri. I would like to thank Dr. Larry Ries and Dr. Lori Thombs for giving their time to serve on my committee. I would also like to thank Adam Gold for assisting me in data extraction, as well as Chelsea Tossing for relentlessly editing my paper. Lastly, I would like to thank the faculty members who have played a part in my education and all of my friends who have helped and supported me through this project.

1. Introduction
With 450 teams participating in open division conference qualifiers last year, the increasing popularity of ultimate frisbee has also increased the need for a reliable ranking system. Efforts to rank ultimate frisbee, however, face problems compounded by both the high dimensionality of the number of teams and the nonstandard structure of the season. A regular season is not comprised of playing a game or two per week against conference opponents, but rather an entire tournament in a weekend against competition from across the country. Once the regular season ends, the national qualifying system begins. The steps to qualifying for nationals involve first competing at a conference tournament between local opponents and then advancing to a regional tournament that determines national attendance.

1.1 The importance of an accurate ranking system


The regular season encompasses all USAU sanctioned tournaments up until a deadline, usually at the beginning of April. The goal of USAU is to be able to use the sanctioned tournament results to accurately rank teams from across the nation. In order to be considered in the ranking system, a team must compete in 10 games prior to sectionals and fill out the necessary paperwork to ensure their eligibility. Prior to the beginning of qualifying rounds, rankings are used to allocate bids to regionals and nationals. Teams are awarded these bids based on conference and regional tournament placement. Specifically, bids to nationals are allocated to match the number of teams from each region ranking in the top 20 at the end of the regular season. The accuracy of the rankings

is paramount to teams ranked from 10-30, as those teams have the greatest potential to benefit from an extra bid allocated to their region.

1.2 The current ranking method


The current algorithm designed to rank frisbee teams was created by Sholom Simon (USA Ultimate College Rankings Algorithm, 2012). Simply put, each team receives rating points for every game played and those points are averaged. The algorithm takes the rating for each of the two competing teams and swaps their rating prior to the contest to reflect the strength of schedule amongst the two competitors. We define Rn to be the teams new rating and Ropp to be the opponents current rating, such that ! = !"" +
!"" !

(1)

Once a game is finished, a team can gain or lose rating points based on the margin of victory or loss. From equation 1, x is a factor that takes into account the score of a given game as defined in equation 2: = max (0.66, 2.5 !"##"#$%&'() ).
!"#$%&'(")* !

(2)

There is a cap in place so that once the losers score has been doubled, the maximum points gained/lost are attained.

There is a weighted decay function in place to give recent games precedence when calculating the new rating, whereas games in the first week of competition receive the lowest weight. The weighting is doubled for games at regionals and tripled for games at nationals under the assumption that teams are at full strength for these contests. The algorithm is run every Tuesday, and the new ratings from the weekend are reinitialized as 6

the week one starting values and run through to the current week. The reiteration is done 20 times with convergence attained around the tenth reiteration (USA Ultimate College Rankings Algorithm, 2012). The reiterative process is expected to account for underrated or overrated opponents.

While the bids to nationals are determined in early April, tournament play continues and weekly ratings will still reflect current performances. Rating calculation ends when the national tournament has concluded, resulting in a final ranking for the season, which is important when considering tournaments to attend in the upcoming season.

1.3 Scrutiny of the current ranking method


There are a few concerns over USAUs current methodology for ranking frisbee teams. The first stems from the algorithms approach to reflecting strength of schedule. While effective, the winning team may lose rating points under two circumstances as a result of a difference in ratings between the two competitors. An exceedingly high difference between the competitors ratings can result in the winning team losing rating points regardless of score. Alternately, the winner may lose rating points if the opponent scores more points than expected.

The second concern is the mercy threshold that initiates once the losing teams score has been doubled. If a game is played to 15, winning 15-0 is considered equal to winning 157. Considering the loss of information that occurs for teams with shorter seasons, this is an oversight; the fewer games a team plays, the more vital it becomes to include every 7

piece of information available in their rating. The mercy threshold may also dilute a teams true rating by undervaluing large margins of victory.

A third concern is that the algorithm disregards forfeits, and ratings are computed as if the game was never played. This allows teams to abuse the system by protecting themselves if they believe their rating will suffer from a loss. If a team on the cusp of the top twenty is having an off weekend, they are able to forfeit without any repercussions. Conversely, teams playing well that are forfeited against lose the chance to gain rating points. This basic flaw in the rating system can undermine the competitive nature of the sport by discouraging game play.1

The final concern is the method by which rating points gain is calculated for the margin of victory. The calculation forms a concave curve that disagrees with the fundamental nature of ultimate frisbee. A close game should indicate that the teams are similar in strength, and each additional point scored by the loser becomes more valuable. However, under the USAU model, the marginal value of the opponents points scored on the winner increases as the number of points scored decreases. Figure 1 demonstrates the concave curve for the USAU point allocation system. There are a finite number of scores that exist in a frisbee game and Table 1 captures the marginal value for each score possible. As seen in Table 1, the less competitive the game, the higher the marginal value of each point becomes until the mercy threshold takes effect. A more intuitive model would have a convex function in which the marginal value for each point scored increases as the 1 The third concern discussed here is a specific problem referred to as gaming the rankings in the ultimate frisbee community. 8

losing team scores more points. The USAU rating points calculator also has marginal values that carry a large weight when compared to the surrounding values. There is no logical reason for the large jump in values found around the mercy threshold seen in Table 1.

700.00# 600.00# 500.00# 400.00# 300.00# 200.00# 100.00# 0.00# 0# 2# 4# 6# 8# 10# 12# 14# 16#
Figure 1: A graph giving the USAU point allocation scheme assuming the winning score is 15.
Loser&Score/Winner&Score 16#15 15#14 14#13 13#12 12#11 11#10 10#9 9#8 8#7 7#6 6#5 5#4 4#3 3#2 2#1 1#0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 27 32 41 54 74 103 116 0 0 0 0 0 0 0 29 37 48 62 84 118 44 0 0 0 0 0 0 0 17 25

194 322 162 246 0 0 0 0 0

96 130 176 246 116 0 0 0 0 0 0

62 75 96 125 158 196 88 0 0 0 0 0 0 0 0

52 77 118 162 0 0 0 0 0

45 63 93 143 68 0 0 0 0 0

40 54 76 110 136 0 0 0 0 0 0

36 47 63 89 129 54 0 0 0 0 0 0

Table 1: A table showing the marginal difference in the USAU rating points gained. The left column represents the two scores that are being compared.

1.4 An alternative approach to ranking frisbee teams


A different approach to ranking frisbee teams should be considered using a model that minimizes the concerns raised by the USAU algorithm. The model should reflect the same principles exhibited by participants in ultimate frisbee, such as the variability of a teams performance. Variability can be captured using statistical models; one of the first models developed for pairwise comparisons was created by Arpad Elo for the game of chess. Elo (1978) is able to measure pairwise comparisons of high dimensionality across a given time period and calculate a rating of relative strength for each competitor. The Elo model and its extensions can be seen in more complex models such as Glickmans (1993) Glicko rating system. The biggest difference between the two models is that the Glicko system includes an estimate of reliability in the rating of a contender. I chose to use the Elo method for its simplicity in implementation and programming.

For the game of chess, Elo (1978) starts off each competitor with a provisional rating period of about 30 games. Once the provisional period ends, a player is given a rating reflective of their performance in that extent of time. That rating follows a distribution where the mean is equal to the rating and the variance is a measure of reliability; Elo (1965) and McClintock (1977) show that many performances of an individual will be normally distributed on an appropriate scale. The player rating is updated after a given time period on a continuous or periodic basis, and will eventually converge to its true strength rating. An unexpected result could be the consequence of a statistical fluctuation or an actual change in the players ability; thus, new performances are weighted based on how much importance is given to past performances. The weight is a measure of 10

reliability for a teams distribution. I use the Elo algorithm as the basis for my rating algorithm with some minor modifications to the assumptions in the model. The specifics of the Elo algorithm and my adjustments to its assumptions will be discussed in detail in the methodology section.

2. Data
The data used in this analysis were collected from www.usaultimate.org in the month of June 2012 and represent the 2012 USAU open sanctioned tournaments. The data were not readily accessible and had to be harvested using AutoHotKey, a program that allows the user to automate keystrokes. Using AutoHotKey, I was able to copy all of the information on the page for each competitor and paste it into a text file. I then converted the text file to an excel document and parsed the data to obtain the information necessary to create two data sets. The first dataset was in correspondence with all of the teams listed under the RRI webpage (450 teams) while the second dataset included only USAU sanctioned teams (371 teams). It was necessary to evaluate two separate datasets to obtain a strict comparison between the proposed algorithm and the USAU algorithm. A comparison is also done between the USAU sanctioned teams and teams that competed in USAU sanctioned tournaments to determine if the exclusion of additional teams affects the final rankings. Similar results are expected when comparing the two datasets. Irrelevant information was omitted; for instance, several outcomes were listed F-F or F-L, both of which have no merit when considering the winner of a game. Game outcomes with a single score reported were not included and are shown as 9-_. A sample of the data set is included in Figure 2. 11

WinningTeam LosingTeam
MississippiState MississippiState MississippiState MississippiState MississippiState Mississippi Mississippi Mississippi Mississippi MississippiStateB FloridaStateB Mississippi Auburn Mississippi FloridaStateB MississippiStateB Rhodes Auburn

WinnerScore LoserScore RatePer Tournament


11 11 11 11 13 11 11 11 10 1 3 8 6 10 3 1 1 7 1 1 1 1 1 1 1 1 1 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012 Cowbell0Classic02012

Figure 2: The column labeled RatePer represents the rating period in which the contest took place; a RatePer corresponding to 1 stands for the first week of competition.

3. Methodology
3.1 Explanation of Elo Algorithm
Elo (1978) defines a proper rating system as one that can effectively rank teams and provide a measure of the relative strength of competitors, however strength may be defined. The initial assumption in the model is that each competitors rating follows a normal distribution. There are different approaches proposed by Elo (1978) for instituting pairwise comparisons. I chose to use his method of continuous rating updates, as I will be calculating the rating of each competitor on a weekly basis. The continuous rating method is given by the formula: ! = ! + ( ! ), (3)

where ! is the new rating and mean of the players distribution, ! is the old rating, is the weighting function and is a binary variable taking the value of 0 for a loss or 1 for a win. In addition, ! for a particular team is the expected value of winning a game against some rated opponent, with rating !"" , given by: ! =
!"!! /!"" !"!! /!"" !!"!!"" /!""

(4)

12

From (3), based on the game outcome, the new rating is updated by adding or subtracting points to the old ranking. Because is binary, the winner of the game will always gain points and the loser of the game will always lose points. This is important because, regardless of the score of a game, the winning team should not lose rating points. Another central aspect of the Elo model is that part of the rating update is dependent on the expected outcome of a game. If a team is rated far higher than its opponent, it is reflected in the expected value calculation. Therefore, the winner of the contest will gain points, but a high expected value for winning will result in a low point gain. This accurately portrays the construct of an ultimate frisbee tournament, as highly rated teams will play lower rated competition in pool play2. This can also deter teams from playing only low rated competition to build a ranking that may not accurately represent their strength.

During the season, ! will move toward the mean of its normal distribution. The variance for this normal distribution is needed to model the consistency of each team and under Elos methodology reflects the weight of a performance for a time period. In this case, K is defined as a weight to update the new rating with the previous rating Under the Elo model, this weight is between 10 and 32, with recent results reflected by greater values.

As a modification to the standard algorithm, I wanted to incorporate the score differential into the model because it contains a significant amount of information. In an attempt to include the game score in the algorithm, I looked at the model from a Bayesian perspective with a normal likelihood and a normal prior. Each teams rating is assumed to 2 Pool play is a round robin style tournament of a small group of teams, usually four or five; often taking place on the first day of competition in a 2-day tournament. 13

be the result of a normal distribution, so the justification for a normal likelihood is obvious. The prior distribution quantifies our a priori understanding of the unobservable quantities of interest (Wikle 2007) and should also be considered normal. The posterior mean can then be written as: = +
!!! ! ! !! ! !

= + ( ).

(5)

The prior mean () is adjusted toward the sample mean estimate () where 2 is the variance for the prior, ! is the variance for the sample estimate and is the number in the sample. The K function in (5) is a ratio of the variances from the likelihood function and the prior information. This is very similar to our Elo algorithm and serves as the motivation for a different update function for K.

The ratio of variances in the context of rating teams is hard to define. The variance is understood as the under and over -performance of a team, but there is no mathematical measure to show that a team has over or under -performed aside from differences between ratings and opinion. Essentially, the ratio of variances weight the reliability of the new information using a priori knowledge. The score of the game is included in the a priori knowledge, leading to the decision to incorporate score differential in the weighting update function.

3.2 Changing the Update Parameter to a Function


When considering the updated weighting function, I first tried to avoid some of the errors I believe to be present in the USAU algorithm. The primary issue with their updating scheme is that the curve of the points gained is concave; the value of scoring a point on 14

the eventual winner decreases as the game becomes closer. In order to correct this, I looked at a function created by Murray (2012): = + ( ) (!!!!"#$%&'( ), where, = Points awarded for winning on universe point3, =Total points possible awarded, =percentage of points awarded, =of the score from the game, =points the winner scored. The proposed allocation scheme corrects the concavity of the K weighting function as seen in Figure 3. Figure 3 also illustrates that there is no cutoff value for beating a team, instead taking into account all information indicated by the final score of the contest. Table 2 shows that the marginal point allocation increases as the game becomes closer and there are not any unexpected outstanding values exhibited. Forfeits are also included in the model and treated as the maximum amount of points gained for the winner. I allotted 200 points possible with 50 points awarded for a universe point win and a p of 0.80. My values were chosen on a subjective assessment of point allocation in order to clearly discern differences in the score. My chosen values have scaled down the point allocation in comparison to the USAU algorithm to avoid over-inflating the K weight values in order to try and retain the normality assumption. In order to check the accuracy of the change when converting the weighting parameter to a function, results were calculated by setting K equal to 24. 3 Universe point occurs when two teams are tied and the next point will win the game for either team. 15
(!!!!"## )

(6)

250.00 200.00 150.00 100.00 50.00 0.00 0 2 4 6 8 10 12 14 16


Figure 3: The proposed allocations rating point allocation scheme assuming the winning score is 15

Loser&Score/Winner&Score 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 16#15 31 15#14 31 14#13 31 13#12 32 25 12#11 32 25 20 11#10 33 26 20 16 10#9 34 26 21 16 13 9#8 35 27 21 16 13 10 8#7 36 28 22 17 13 10 8 7#6 38 29 22 17 13 11 8 7 6#5 41 30 23 18 14 11 8 7 5 5#4 45 33 24 18 14 11 9 7 5 4 4#3 51 36 26 19 15 11 9 7 5 4 3 3#2 61 41 29 21 16 12 9 7 6 4 3 3 2#1 83 49 33 23 17 12 9 7 6 4 3 3 2 1#0 67 39 26 18 13 10 8 6 5 4 3 2 2

Table 2: A table showing the marginal difference in the proposed rating points gained. The left column represents the two scores that are being compared.

3.3 Explanation of Assumptions for Proposed Algorithm


A continuous rating system updates a new rating after a single game; subsequently, each new rating is blended into the old rating. Elo (1978) proposes different methods of blending, but I discuss an alternative approach below. Elo (1978) suggests that long 16

events should be divided into ratable segments for each application of (3). My methodology considers the entire weekend tournament to be a segment, treating games played on Saturday as equivalent to those taking place on Sunday. This assumption has some weaknesses, but I believe that the merit outweighs the drawbacks. A typical 2-day tournament seeds each team attending according to their perceived strength before either pool play or immediate bracket play will commence. The results of pool play are then used to reseed the teams and put them into bracket play for varying places on Sunday. It is intuitive that pool play and bracket play are dependent on one another, but I believe that teams performing to expectations will be playing similarly rated competition for placement games. If a team over or under -performs, it is reflected in the results of individual games as opposed to the final placement from the weekend. Therefore, each game will be treated as independent from the next game.

Another consideration when examining the structure of ratable segments is the order of the games played and how sequence can affect a teams rating. For instance, if a team is overrated and loses to a low rated opponent in the first game, the low rated opponent reaps the reward of playing the over-ranked opponent first. In an effort to combat rating points gained as a result of the order of the games, I randomly sample matches at each tournament without replacement. I believe this will nullify any order-imposed gains a team may receive. I will sample without replacement 100 times, which I believe is sufficient. I will then blend the 100 ratings by taking the mean rating for each team which will yield the new rating (Rn). The integrity of the normality assumption of the rating is still preserved and any sort of rating points gained from the order of games is foregone.

17

The Elo algorithm requires a provisional rating period of 30 games before a participant is given a rating. Many teams cannot satisfy this condition, as USAU only requires a team to participate in ten games to be allowed in the ranking system. Therefore, in lieu of using a provisional rating period, a reiterative process similar to USAUs algorithm is used. Reiteration helps take into account the opponents strength at the time of the match. The process helps to avoid over and under -ranked teams by rerating the previous encounters with the inclusion of new information provided by the new rating (! ). Once a ratings period is completed (say R1), that week will be used as the initial values for the first week. The new initial values will then be computed up through the current ratings period (say R1new). The rank order of each team, R1 and R1new, will then be compared by the Spearman rank correlation. If the correlation measure is above .99, the process will continue on to the next week and then repeat the reiteration process. The Spearman rank correlation is believed to help reduce the number of reiterations required. If the rank order of teams becomes stable, then there is no reason to reiterate and unnecessarily spread the ratings of teams or waste computing time. The Spearman rank correlation is the most intuitive correlation measure because we are concerned with the ranked order of the teams and therefore it provides a measure of correlation between iterations. In order to check the robustness of the reiteration process, Kendells tau could be considered.

4. Results
In this section, a sample of only 30 teams is included due to the high dimensionality of the data set and the belief that these teams have the best chance to attend nationals. 18

USAU$Algorithm
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Team Pittsburgh Oregon Wisconsin CarletonCollege Tufts? Iowa Minnesota Colorado CentralFlorida Michigan California NorthCarolina Luther Texas Washington Stanford Whitman Connecticut Illinois MichiganState Ohio Vermont OhioState TexasA&M NorthCarolinaNWilmington Georgia Kansas Dartmouth PR 1852 1829 1786 1769 1740 1692 1674 1655 1648 1616 1612 1600 1594 1588 1578 1566 1550 1542 1530 1527 1527 1521 1516 1512 1505 1499 1497 1488

Parameter$K$(RRI$Dataset)
Team Oregon Pittsburgh Wisconsin CarletonCollege Minnesota Tufts CentralFlorida Texas California NorthCarolina Luther Stanford Colorado TexasAM Ohio Iowa MichiganState Michigan Connecticut Washington GeorgiaTech NorthCarolinaWilmington SouthCarolina Florida OhioState Illinois Georgia Vermont PR 2528 2453 2396 2298 2264 2257 2227 2116 2091 2091 2087 2077 2075 2075 2056 2037 2008 1992 1983 1982 1981 1969 1967 1964 1959 1939 1906 1894

Parameter$K$(USAU$dataset)
Team Oregon Pittsburgh Wisconsin CarletonCollege Tufts Minnesota CentralFlorida Texas NorthCarolina California TexasAM Luther Stanford Ohio Colorado Iowa MichiganState Connecticut Michigan GeorgiaTech NorthCarolinaWilmington SouthCarolina Washington OhioState Florida Illinois Georgia Vermont PR 2366 2290 2227 2129 2105 2098 2077 1964 1941 1933 1929 1928 1923 1916 1915 1884 1860 1848 1845 1835 1826 1824 1824 1822 1810 1787 1758 1756

Function$K$(RRI$Dataset)
Team Pittsburgh Wisconsin CarletonCollege Oregon Tufts Minnesota CentralFlorida Luther California TexasAM Colorado Stanford Texas Iowa NorthCarolina Michigan Ohio MichiganState OhioState GeorgiaTech NorthCarolinaWilmington Washington Connecticut Florida Illinois SouthCarolina Georgia Vermont PR 3122 2986 2984 2984 2797 2762 2737 2651 2601 2594 2587 2578 2553 2550 2544 2507 2494 2430 2414 2402 2400 2397 2374 2364 2350 2339 2311 2264

Function$K$(USAU$Dataset)
Team Pittsburgh Oregon CarletonCollege Wisconsin Tufts Minnesota CentralFlorida Luther California TexasAM Colorado Stanford Texas Iowa NorthCarolina Michigan Ohio MichiganState OhioState GeorgiaTech NorthCarolinaWilmington Washington Connecticut Florida Illinois SouthCarolina Georgia Vermont PR 3001 2864 2860 2859 2674 2638 2609 2527 2477 2473 2463 2453 2431 2423 2417 2378 2367 2301 2288 2280 2277 2273 2248 2243 2229 2217 2184 2143

Table 3: The column heading involving Parameter K references as constant under the original logic of Elo (1978). Alternatively, the Function K heading references the weighting point differential function created by Murray (2012). The dataset used for each algorithm is denoted in parentheses. Lastly, the PR column is the final rating calculation produced by each algorithm and dataset.

5. Conclusion
Due to the inherent subjectivity of rating schemes, there is no single best or most efficient strategy to creating a rating system. The first way to recognize whether a rating scheme is viable is to evaluate whether or not it agrees with the common belief held by the group that is being rated. The second evaluation would use the rating scheme as a predictive tool to see if the model produces results similar to those that occur in the real world. The second evaluative technique was not the focus of this paper and is a measure that can be explored at a later time. It should be noted that direct comparison of the PR measure across datasets and rating schemes is not accurate because of the difference in dimensionality and the rating update method. Also, when comparing the USAU Algorithm to Function K and Parameter K values, only the USAU dataset can be

19

used. To understand how closely related all of the results were; the results can be seen in Table 3.

Results Compared
USAU Algorithm vs. Function K USAU Algorithm vs. Parameter K Parameter K vs. Function K (USAU dataset) Parameter K vs. Function K (RRI dataset)

Correlation
0.974 0.868 0.986 0.991

Table 4: The Spearman rank correlation is used to compare the datasets and the different weight update methodologies

The RRI dataset was included to assess whether a difference existed between the final rankings in the top 30 teams of the competitors in USAU sanctioned tournaments and USAU sanctioned teams. In the Parameter K top 30 results, minor changes in the order of teams can be observed. In the case of the Function K results, only two teams (Oregon and Wisconsin) differ in rank between the two datasets. This would suggest that the excluded teams are likely lower level teams that have little impact on the teams vying for a spot at nationals. Next, the Parameter K model was included to provide evidence that a change in weighting scheme is appropriate by giving baseline results for ranking teams. From Table 4, a correlation of 0.986 and 0.991 between the Parameter K and Function K results is high enough to suggest that changing the weight function to reflect score is appropriate. When comparing the overall outcomes from the USAU algorithm with my proposed algorithm of Function K, the resulting Spearman rank correlation stands at 0.974; this high level of correlation suggests that the results are very similar. In conclusion, based on a subjective assessment of my final results and the high correlation measure observed, I believe I have developed a sound rating method.

20

6. Discussion
In this section, I will be discussing specific details of the USAU algorithm and my Function K algorithm for the USAU dataset, along with ideas for future work. I believe I have established a quicker method to rate teams when compared to the USAU algorithm. The USAU algorithm uses 400 iterations while my algorithm used only 122 iterations. The high correlation threshold allows my program to continue to the next week if there is insignificant change to the order of the teams. This is different from the USAU approach because their rating values have the ability to converge to a number while mine do not. Convergence is attained because winning teams may still lose rating points, whereas I adjusted the assumption so that winners consistently gain rating points. I cannot comment on the statistical approach for the USAU algorithm because there is no accessible formal paper detailing Shalom Simons approach.

Burruss (2012) explains that for a team to attain a high rating under the USAU algorithm, all they should do is win. Simply put, he explains that strength of schedule has little bearing on a teams ability to rank in the top twenty, although the team cannot solely play weak competition as explained in Section 1.3. This idea is conveyed by Table 3 and will be specifically discussed in the cases of Whitman, Iowa, and Texas A&M. Whitman is a textbook example of what was described earlier as gaming the rankings. They are within the top 20 in the USAU algorithm, yet outside the top 30 in my algorithm at the end of the year; I believe this is largely due to the inclusion of forfeits in my model. Whitman was able to play on par with several of the elite college teams, but at two tournaments where their rating was in jeopardy, they decided to forfeit. As previously 21

discussed, I included forfeits in my calculations because neglecting them discourages game play. Iowa and Texas A&M are similar when discussing the effects of winning on a teams rating for the USAU algorithm. They earned high ratings by simply winning games, many by large margins, and had a regular season records of 23-5 and 29-2 respectively. The final ratings of Iowa and Texas A&M fell within my top twenty, and reflect that my algorithm also values winning as a determinant of higher ratings.

The last part of the discussion will focus on future analyses of this methodology and other approaches of interest. Ideally, I would establish a less arbitrary means of choosing the parameters for my proposed function. While it would be possible to estimate these values, there is still a measure of subjectivity because the true rank of each team is unknown. Using values between 10 and 30, I could model the K function based on the volatility of a team and the score of a game. While this would preserve Elos normality assumption, I believe that the results would be similar to my findings, albeit rescaled. I would like to be able to check the accuracy of my method by predicting game outcomes at nationals. The only way I can currently accomplish this is through the Parameter K method, as I do not have priors to predict the score of each team. I believe it would be possible to create such a model, and my current research reflects a good starting point for this. Next, I would like to do some research in the field of network analysis. I believe that network analysis would be very promising when discussing the bid allocation process for regional qualifiers and the national tournament. However, I think it would be difficult to utilize in rating teams due to the structure of the regular season. Network analysis relies on using clusters to rank its components, but teams rarely play within their entire

22

conference or region prior to qualifying tournaments. This concern is purely speculative, and necessitates further research into different methods of pairwise comparisons.

23

References
Burruss, Lou. (2012). Rankings Under the Hood, Skyd Magazine. Retrieved from http://skydmagazine.com/2012/03/rankings-under-the-hood/ Elo, A.E. (1965). Age Changes in Master Chess Performances, Journal of Gerontology, 20, 289-299. Elo, A.E. (1978). The Rating of Chess Players Past and Present. McClintock, W. (1977). Statistical Studies of the Elo Rating System, 1974-77. Report to USCF Policy Board, privately produced. Murray, T. (2012, September 17). An Interpretation and Critique of the USAU Ranking Algorithm. Retrieved from http://skydmagazine.com/2012/09/an-interpretationand-critique-of-the-usau-ranking-algorithm/ Wikle, C & Berliner, M. (2007). A Bayesian tutorial for data assimilation. PhysicaD, 230, 1-16. (2012). USA Ultimate College Rankings. Retrieved from http://www.usaultimate.org/competition/college_division/college_season/college_ rankings.aspx

24

Você também pode gostar