Pter Bruck,1 Istvan Rthy,1 Jan Tobochnik,2 and Pter rdi2
1. ProcessExpert Ltd, Budapest, Hungary 2. Kalamazoo College, Kalamazoo, MI, USA Pagerank quickly became one of the most popular ranking algorithms. Page and Brin empirically set the crucial parameter of the algorithm, the damping factor (d) to 0.85, which well suited for the WWW, but proved to be unsuitable for scientific citations; here the best results were achieved with d=0.5. In order to explain this difference we selected simple model graphs to study the correlation between the structure of the network and the most appropriate value of d. When the value of d increases, the relative ranking of the nodes changes. At a given value of d the good hits suddenly disappear from the top of the list - this is called "rank reversal". The Pagerank equation enables us to calculate the location of the rank reversal for our model graphs: it happens at d = 1-1/s (s is the number of inlinks pointing toward the central node). Until this d value the central node receives the highest Pagerank; i.e. Pagerank is a proper measure of the centrality. The higher is the number of inlinks, the higher is the upper limit of the applicability of Pagerank: if s is 2, Pagerank can be used in the 0 - 0.5 range; if s is 7, the permissible range of d for the model graphs is 0 - 0.85. For large networks no theoretical method is available to establish the upper limit of Pagerank's applicability. With the lack of field-proven analogies (e.g. for the WWW or scientific citations) only the "trial and error" approach can be used to find the location of the rank reversal, which is the upper limit of Pagerank's applicability. This problem could be avoided if an alternative - more refined - method of stochasticity adjustment is used. Let us introduce a new node (O) which has inlinks as well as outlinks with every node. This way the manual value selection of the damping factor can be avoided: when a node with n outlinks receives a new outlink, the share of the node to be passed toward each neighbor is reduced by n/(n+1). If n is 1, the "damping" for this node is by factor 2; if it has 100 neighbors, the damping is only 1%, i.e. the damping of the nodes is automatically adjusted and in the critical parts of the graph the changes of the information flow is relatively small. Otherwise the computation exactly follows the original paper of Page and Brin; therefore we call this solution Pagerank 2.0. In the presentation we compare the ranking results and the stability of the proposed method with Pagerank using the US patent network having 4 million nodes and 44 million edges.