Você está na página 1de 4

1 A Historical Note on Shufe Algorithms

Derek OConnor, University College Dublin

Durstenfelds Random Permutation (1964) algorithm is shown to be the rst optimal random permutation generator to be published. It is not merely a computer version of the older Fisher-Yates Shufe algorithm (1938) , which is not optimal. Please note that this paper was rejected by ACM TOMS because it was not considered to be research Categories and Subject Descriptors: CSS.I.1 [Computing Methodologies]: Random Permutation Generation General Terms: Design, Algorithms, Performance Additional Key Words and Phrases: Random permutation generators, Shufe algorithms ACM Reference Format: OConnor, D. 2011. A Historical Note on Shufe Algorithms. ACM Trans. Math. Softw. 1, 1, Article 1 (January 1), 4 pages. DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

1. INTRODUCTION

Shufe algorithms, as the name implies, are used to shufe or randomly permute lists of objects. As such, they are a subset of random combinatorial generators. Shufe algorithms are used in a large variety of settings, but especially in the design of statistical experiments. A more exotic use of a shufe algorithm is this: Microsoft, under EU law, must offer buyers of its operating system, a choice of Web browsers. Microsoft set up a website [Microsoft 2011] from which a choice of 5 browsers could be downloaded: Firefox, Opera, Safari, Chrome, and Explorer. These are arranged in a horizontal list and periodically shufed so no one browser appears in a favoured position. Unfortunately Microsoft used a badly-implemented shufe algorithm which, ironically, put Microsofts Explorer in a less-favoured position more often than the others. [Weir 2010]
2. A GENERIC SHUFFLE ALGORITHM

The following generic algorithm generates a random permutation of a set S of size n: Choose an element rk at random from S for k n, n 1, . . . , 2 do S S {rk } [k] rk

(1)

The loop variable k is the current size of S, rk is the element chosen randomly at the kth iteration, and the array maintains the elements of S in the random order in which they were chosen. The nal step of the algorithm is [1] S, which moves the remaining element of S to . We note that this algorithm is a special case of selecting a sample of size n from a population of size n, without replacement. It is obvious that the running time of any implementation of this generic algorithm will depend, crucially, on how the set operations are implemented. The set operations required here are addition
Authors address: Derek OConnor, Donard, Co Wicklow, Ireland. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies show this notice on the rst page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specic permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. c 1 ACM 0098-3500/1/01-ART1 $10.00 DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

ACM Transactions on Mathematical Software, Vol. 1, No. 1, Article 1, Publication date: January 1.

1:2

Derek OConnor

(union), deletion, and, possibly, a membership test to ensure the without replacement condition. The complexity of these operations will be O(n), O(log n), or O(1), depending on the algorithms and data structures used.
3. DURSTENFELDS SHUFFLE ALGORITHM

The rst O(n) shufe or random permutation generator to be published was by Richard Durstenfeld [Durstenfeld 1964]. This algorithm came to the notice of programmers because it was included in Knuth [Knuth 1969], page 125, as Algorithm P. A succinct statement of the algorithm is: for k n, n 1, . . . , 2 do Choose at random a number r in [1, k] Interchange [r] and [k], (2)

where [1 . . . n] is the array to be shufed or randomly permuted. This is a truly elegant algorithm, computationally and mathematically. Computationally it handles the set operations by dividing the array into an unshufed set [1, . . . , k], and a shufed set [k + 1, . . . , n]. At each iteration k, it chooses a random element [r] from the unshufed set [1, . . . , k], and interchanges it with [k]. This moves [r] into the new shufed set [k, . . . , n], where it remains, unmoved, for all subsequent iterations. This is done in O(1) time and space. Thus, the time and space complexity of Durstenfelds algorithm is O(n), which is optimal. Mathematically, the algorithm is elegant because it implicitly uses the well-known lemma that every permutation is a product of transpositions. At each iteration k, the algorithm chooses a random number rk [1, . . . , k] and performs a transposition ([k], [rk ]). Thus the random permutation produced can be written as = ([n], [rn ])([n 1], [rn1 ]) ([k], [rk ]) ([2], [r2 ]), where ([k], [rk ]) is the transposition or interchange of [k] and [rk ].
4. THE FISHER-YATES SHUFFLE ALGORITHM.

Knuth [Knuth 1998] , page 145, states This algorithm [Algorithm P] was rst published by R. A. Fisher and Frank Yates [Statistical Tables (London 1938), Example 12], in ordinary language, and by R. Durstenfeld [CACM 7 (1964), 420] in computer language. I believe Knuth is wrong in attributing Algorithm P (Durstenfelds algorithm) to Fisher & Yates. I will show that Durstenfelds Random Permutation algorithm was a new, optimal shufing algorithm when it appeared in 1964, in the same sense that Hoares QuickSort algorithm was a new, optimal sorting algorithm when it appeared in 1962. The discussion that follows is based on [Fisher & Yates 1963], Example 12, page 37, which is reproduced verbatim below. This allows us to identify the original Fisher-Yates Shufe algorithm and the improved algorithm of C. R. Rao.1 Example 12: Required to arrange 8 treatments, numbered 18, in random order. [Pre-6th ed. Method]. This operation can be performed by selecting one of the treatments at random from the eight, then selecting a second from the seven that remain, and so on. When the number of treatments is at all large, however, this procedure is tiresome, since each treatment must be deleted from a list as it is selected and a fresh count made for each further selection. [New method by C.R. Rao]. To avoid this C.R. Rao (78) has proposed an alternative method. The one-dimensional version of this, suitable for the present example, consists of taking 10 cells numbered 09, and allocating the numbers 18 to these according to a sequence of 8 single digit random numbers. Thus, using the rst column of Table XXXIII(I), which begins
1I

have broken the original rst paragraph into two parts so that the old and new methods are clearly visible.

ACM Transactions on Mathematical Software, Vol. 1, No. 1, Article 1, Publication date: January 1.

A Historical Note on Shufe Algorithms

1:3

0, 9, 1, 1, 5, 1, 8, 6 we allocate 1 to cell 0, 2 to cell 9, 3 to cell 1, 4 to cell 1, etc., the complete allocation being Cell: 0 1 1 3,4,6 2 3 4 5 5 6 8 7 8 7 9 2

The three numbers in cell 1 must now be permuted. This can be done by the same process, using the next three random numbers 3,5,1 to give the order 6,3,4, so that the nal permutation is 1, 6, 3, 4, 5, 8, 7, 2. Alternatively this . . . etc., etc. Here is the Fisher-Yates algorithm in formal algorithmic style, where S[1, . . . , n] is the list to be shufed: Choose at random a number r in [1, k] for k n, n 1, . . . , 2 do Scan S[1, . . . , k] for the r th unmarked element S[u] (3) [k] S[u], and mark S[u] We can clearly see that this method is not the Durstenfeld shufe: it is an O(n2 ) algorithm because it needs to do an O(n) scan of S at each iteration.2 Equally clearly, Raos method is not the Durstenfeld shufe: apart from the fact that it uses more than n 1 random numbers, it allocates, in this example, 3 numbers to one cell which must be permuted subsequently. This is not remotely similar to Durstenfelds algorithm.
5. EXTENSIONS

Three very useful extensions or modications of Durstenfelds algorithm are of note.


5.1. Pikes Modication

The rst, by Pike, [Pike 1965], stops the algorithm after m n iterations. for k n, n 1, . . . , n m + 1 do Choose at random a number r in [1, k] Interchange [r] and [k], (4)

This is a nice generalization of Durstenfeld because it returns a sample of size m from a population of size n, without replacement, in O(m) time. The sample is [n m + 1, . . . , n], and the population is the original [1, . . . , n]. This is a particularly useful modication if m n.
5.2. Sattolas Modication

This modication by Sattola, [Sattolo 1986] demonstrates that a small change in a combinatorial algorithm can have a dramatic effect. for k 2, 3, . . . , n do Choose at random a number r in [1, k 1] Interchange [r] and [k], (5)

Note that the iteration counter k goes from 2 up to n, and that the random number r is drawn from [1, k 1]. This small modication generates random cyclic permutations, and is, obviously, optimal.
5.3. Knuths Modication

Knuth [Knuth 1998], page 145, notes that if we want just a random permutation of the integers {1, 2, . . . , n}, then no interchange is necessary. for k 1, 2, . . . , n do Choose at random a number r in [1, k] [k] [r] and [r] k (6)

Note that initially [i] = i, i = 1, 2, . . . , n, and that the iteration counter k goes from 1 up to n.
2 See

OConnor [OConnor 2011] for the implementation and analysis of the Fisher-Yates, and Durstenfeld algorithms.

ACM Transactions on Mathematical Software, Vol. 1, No. 1, Article 1, Publication date: January 1.

1:4 6. CONCLUSION

Derek OConnor

For the reasons given above, I believe Durstenfeld should get sole credit for his algorithm. It may have been part of the programmers folklore at the time, but Durstenfeld was the rst to publish this algorithm, although many people at that time were writing algorithms for the exhaustive enumeration of permutations. Durstenfelds algorithm, despite its simplicity, is very subtle. It uses the minimum number of random numbers to perform n 1 random transpositions; it uses no extra space, apart from 3 or 4 scalars. Also, it can generate partial and cyclic permutations with just slight modications. The only reason I can see for not ranking Durstenfelds Shufe with Hoares QuickSort is that permuting things is less important (and difcult) than sorting them.
REFERENCES
D URSTENFELD , R. 1964. ACM Algorithm 235: Random Permutation. Communications of the ACM. 7, 7 , 420. F ISHER , R.A. & YATES , F. 1963. Statistical Tables for Biological, Agricultural and Medical Research 6th Ed. Edinburgh : Oliver & Boyd. http://digital.library.adelaide.edu.au/coll/special/sher/ K NUTH , D. E. 1969. Seminumerical Algorithms 1st Ed. The Art of Computer Programming Series, vol. 2. Addison-Wesley, Reading, MA. K NUTH , D. E. 1998. Seminumerical Algorithms 3rd Ed. The Art of Computer Programming Series, vol. 2. Addison-Wesley, Reading, MA. M ICROSOFT 2011. Microsoft Browser Choice. http://www.browserchoice.eu/BrowserChoice/ OC ONNOR , D. 2011. A Historical Note on the Fisher-Yates and Durstenfeld Shufe Algorithms. http://www.scribd.com/doc/64349616/ P IKE , M.C. 1965. Remark on ACM Algorithm 235: Random Permutation. Communications of the ACM. 7, 7 , 420. S ATTOLO , S. 1986. An algorithm to generate a random cyclic permutation. Information Processing Letters. 22, 315317. W EIR , R. 2010. The New Microsoft Shufe. http://www.robweir.com/blog/2010/03/new-microsoft-shufe.html

ACM Transactions on Mathematical Software, Vol. 1, No. 1, Article 1, Publication date: January 1.

Você também pode gostar