Você está na página 1de 61

Algorithms: search and sorting

(from Wikipedia and MSDN)

Compiled and edited by S. Chepurin, 2009

Contents

Search algorithms .......................................................................................................................... 3


Brute-force search .................................................................................................................................... 5 Linear search ............................................................................................................................................ 8 Binary search ............................................................................................................................................ 10 Interpolation search.................................................................................................................................. 14 Tree traversal............................................................................................................................................ 16 Tree search ............................................................................................................................................ 18

Sorting algorithms ......................................................................................................................... 20


Comparison of algorithms ........................................................................................................................ 22 Often used methods ................................................................................................................................... 23 Memory usage ........................................................................................................................................... 25 Comparison based sorting ........................................................................................................................ 27 Bubble sort ............................................................................................................................................ 27 Quicksort ............................................................................................................................................... 29 Heapsort ................................................................................................................................................ 33 Non-comparison sort ................................................................................................................................ 36 Counting sort ......................................................................................................................................... 36

Appendix ........................................................................................................................................ 38
Optimization in search and sorting........................................................................................................... 38 C/C++ functions ....................................................................................................................................... 39 bsearch .................................................................................................................................................. 39 _lfind...................................................................................................................................................... 41 _lsearch ................................................................................................................................................. 43 qsort....................................................................................................................................................... 45 Samples on search algorithms .................................................................................................................. 47 Linear search ......................................................................................................................................... 47 Binary search ........................................................................................................................................ 48 Interpolation search .............................................................................................................................. 49 Samples on sorting algorithms.................................................................................................................. 50 Bubble sort ............................................................................................................................................ 50 Selection sort ......................................................................................................................................... 51 Heapsort ................................................................................................................................................ 52 Counting sort ......................................................................................................................................... 54 sort and partial sort (STL)..................................................................................................................... 56 Sorting using binary search tree ........................................................................................................... 58 Dictionary hashing algorithm (MSDN) ................................................................................................. 60

Search algorithms
In computer science, the search algorithms - i.e. the algorithms used to find a particular item from a set - are generally divided on uninformed (used on unsorted list) and informed ones (used on already sorted list), that apply knowledge about the structure of the search space to reduce the amount of time spent searching.

List search List search algorithms are used to find one element of a set by some key (perhaps containing other information related to the key). As this is a common problem in computer science, the computational complexity of these algorithms has been well studied. Search on unsorted list These are uninformed search algorithms that do not take into account the specific nature of the problem. As such, they can be implemented in general, and then the same implementation can be used in a wide range of problems thanks to abstraction. The drawback is that most search spaces are extremely large, and an uninformed search (especially of a tree structure) will take a reasonable amount of time only for small samples.

The linear (sequential) search simply examines each element of the list comparing it with the searched one. It has "expensive" O(n) running time, where n is the number of items in the list, but can be used directly on any unordered list. Search based on abstract data structures such as linked list or binary trees (also called binary search trees) requires O(log n) time to search. These algorithms first go through the list to prepare the data, so the search itself is performed on already sorted list. The implementation is based on pointers thus allowing fast insertion and removal of items. See associative array for more discussion of list search data structures.

Search on sorted list These are informed search algorithms. Many informed algorithms are based on tree data structures, and can be extended to work for graphs as well. A separate example in this category is a hash table where search is based on calculating a hashing function for the item to get an index in the table.

Binary search algorithm - selects the next search diapason by always dividing the search interval of a sorted array in half. It runs in O(log n) time, and this is significantly better than linear search for large lists of data. Interpolation search - searches a sorted array by estimating the next position to check based on a linear interpolation of the search key and the values at the ends of the search interval. This is better than binary search only for large sorted lists of uniformly distributed data with average O(log (log n)) time. It has a worst-case running time of O(n). Hash tables require constant O(1) time for search of a table's item (the same as an array). Search based on abstract data structures such as linked list or binary trees.

Also, you may consider less known specialized algorithms:

Fibonaccian search - searches a sorted array by narrowing possible locations to progressively smaller intervals. Begin with two Fibonacci numbers, p (F(n)) and q (F(n+1)), such that p < n q, where n is the size of the array. The first step checks location p. The size of the next interval is p, if the key is less than the item at that location, or q-p (F(n-1)) if it is greater. Note: This is similar to a binary search, but only needs subtraction, instead of divide by two or shift right, to compute the next position. Jump search - searches a sorted array by checking every j-th item until the right area is found, then doing a linear search. The optimum for n items is when j=n. Also known as block search. Secant search - searches a sorted array by estimating the next position to check based on the values at the two previous positions checked. Note: it is called "secant search" because it uses the secant of the function at two successive points to approximate the derivative in the Newton-Raphson formula. Although the theoretical execution time is better than interpolation search or binary search, coding is tricky, and the gains from faster convergence are offset by higher costs per iteration.

Note: every algorithm has its own advantages and drawbacks closely connected to the data character and list's size. Good example is an unbalanced binary tree which is very inefficient to store a large ordered (or partially ordered) list and search an item in such a list. Most list search algorithms, such as linear search, binary search, and binary search trees, can be extended with little additional cost to find all values less than or greater than a given key, an operation called range search. The exception is hash tables, which cannot perform such a search. Tree search In a binary tree structure one of the tree traversal methods is used starting from the root and simply going through the nodes of the subtrees. In a binary search tree the procedure starts from the root node, and goes through the nodes of one of subtree selecting right or left link depending on the comparison with the searched item until the node is found. See chapter "Tree traversal" for detailed information.

Other types

String searching algorithms search for patterns within strings. Genetic algorithms use ideas from evolution as heuristics for reducing the search space. Minimax algorithm used in game theory. Ternary search.

Brute-force search
In computer science, brute-force search or exhaustive search, also known as generate and test, is a trivial but very general problem-solving technique, that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's statement. For example, a brute-force algorithm to find the divisors of a natural number n is to enumerate all integers from 1 to n, and check whether each of them divides n without remainder. For another example, consider the popular eight queens problem, which asks to place eight queens on a standard chessboard so that no queen attacks any other. A brute-force approach would examine all the 64! /56! = 178,462,987,637,760 possible placements of 8 pieces in the 64 squares, and, for each arrangement, check whether no queen attacks any other. Brute-force search is simple to implement, and will always find a solution if it exists. However, its cost is proportional to the number of candidate solutions, which, in many practical problems, tends to grow very quickly as the size of the problem increases. Therefore, brute-force search is typically used when the problem size is limited, or when there are problem-specific heuristics that can be used to reduce the set of candidate solutions to a manageable size. The method is also used when the simplicity of implementation is more important than speed. This is the case, for example, in critical applications where any errors in the algorithm would have very serious consequences; or when using a computer to prove a mathematical theorem. Brute-force search is also useful as "baseline" method when benchmarking other algorithms or metaheuristics. Indeed, brute-force search can be viewed as the simplest metaheuristic. Brute force search should not be confused with backtracking, where large sets of solutions can be discarded without being explicitly enumerated (as in the textbook computer solution to the eight queens problem above). Implementing the brute-force search Basic algorithm In order to apply brute-force search to a specific class of problems, one must implement four procedures, first, next, valid, and output. These procedures should take as a parameter the data P for the particular instance of the problem that is to be solved, and should do the following: 1. 2. 3. 4. first (P): generate a first candidate solution for P. next (P, c): generate the next candidate for P after the current one c. valid (P, c): check whether candidate c is a solution for P. output (P, c): use the solution c of P as appropriate to the application

The next procedure must also tell when there are no more candidates for the instance P, after the current one c. A convenient way to do that is to return a "null candidate", some conventional data value that is distinct from any real candidate. Likewise the first procedure should return if there are no candidates at all for the instance P. The bruteforce method is then expressed by the algorithm

For example, when looking for the divisors of an integer n, the instance data P is the number n. The call first(n) should return the integer 1 if n 1, or otherwise; the call next(n,c) should return c + 1 if c < n, and otherwise; and valid(n,c) should return true if and only if c is a divisor of n. (In fact, if we choose to be n + 1, the tests are unnecessary, and the algorithm simplifies considerably.) 5

Common variations The brute-force search algorithm above will call output for every candidate that is a solution to the given instance P. The algorithm is easily modified to stop after finding the first solution, or a specified number of solutions; or after testing a specified number of candidates, or after spending a given amount of CPU time.

Combinatorial explosion The main disadvantage of the brute-force method is that, for many real-world problems, the number of natural candidates is prohibitively large. For instance, if we look for the divisors of a number as described above, the number of candidates tested will be the given number n. So if n has sixteen decimal digits, say, the search will require executing at least 1015 computer instructions, which will take several days on a typical PC. If n is a random 64-bit natural number, which has about 19 decimal digits on the average, the search will take about 10 years. This steep growth in the number of candidates, as the size of the data increases, occur in all sorts of problems. For instance, if we are seeking a particular rearrangement of 10 letters, then we have 10! = 3,628,800 candidates to consider; which a typical PC can generate and test in less than one second. However, adding one more letter which is only a 10% increase in the data size will multiply the number of candidates by 11 a 1000% increase. For 20 letters, the number of candidates is 20!, which is about 2.41018 or 2.4 million million million; and the search will take about 10,000 years. This unwelcome phenomenon is commonly called the combinatorial explosion.

Methods to speed up brute force search One way to speed up a brute-force algorithm is to reduce the search space, that is, the set of candidate solutions, by using heuristics specific to the problem class. For example, consider the popular eight queens problem, which asks to place eight queens on a standard chessboard so that no queen attacks any other. Since each queen can be placed in any of the 64 squares, in principle there are 648 = 281,474,976,710,656 (over 281 million million) possibilities to consider. However, if we observe that the queens are all alike, and that no two queens can be placed on the same square, we conclude that the candidates are all possible ways of choosing of a set of 8 squares from the set all 64 squares; which means 64!/56!/8! = 4,426,165,368 (less than 5 thousand million) candidate solutions about 1/60,000 of the previous estimate. Actually, it is easy to see that no arrangement with two queens on the same row or the same column can be a solution. Therefore, we can further restrict the set of candidates to those arrangements where queen 1 is on row 1, queen 2 is in row 2, and so on; all in different columns. We can describe such an arrangement by an array of eight numbers c[1] through c[8], each of them between 1 and 8, where c[1] is the column of queen 1, c[2] is the column of queen 2, and so on. Since these numbers must be all different, the number of candidates to search is the number of permutations of the integers 1 through 8, namely 8! = 40,320 about 1/100,000 of the previous estimate, and 1/7,000,000,000 of the first one. As this example shows, a little bit of analysis will often lead to dramatic reductions in the number of candidate solutions, and may turn an intractable problem into a trivial one. This example also shows that the candidate enumeration procedures (first and next) for the restricted set may be just as simple as those of the original set, or even simpler. In some cases, the analysis may reduce the candidates to the set of all valid solutions; that is, it may yield an algorithm that directly enumerates all the solutions (or finds one solution, as appropriate), without wasting time with tests and the generation of invalid candidates. For example, consider the problem of finding all integers between 1 and 1,000,000 that are evenly divisible by 417. A naive brute-force solution would generate 6

all integers in the range, testing each of them for divisibility. However, that problem can be solved much more efficiently by starting with 417 and repeatedly adding 417 until the number exceeds 1,000,000 which takes only 2398 (= 1,000,000 417) steps, and no tests. Reordering the search space In applications that require only one solution, rather than all solutions, the expected running time of a brute force search will often depend on the order in which the candidates are tested. As a general rule, one should test the most promising candidates first. For example, when searching for a proper divisor of a random number n, it is better to enumerate the candidate divisors in increasing order, from 2 to n - 1, than the other way around because the probability that n is divisible by c is 1/c. Moreover, the probability of a candidate being valid is often affected by the previous failed trials. For example, consider the problem of finding a 1 bit in a given 1000-bit string P. In this case, the candidate solutions are the indices 1 to 1000, and a candidate c is valid if P[c] = 1. Now, suppose that the first bit of P is equally likely to be 0 or 1, but each bit thereafter is equal to the previous one with 90% probability. If the candidates are enumerated in increasing order, 1 to 1000, the number t of candidates examined before success will be about 6, on the average. On the other hand, if the candidates are enumerated in the order 1,11,21,31...991,2,12,22,32 etc., the expected value of t will be only a little more than 2. More generally, the search space should be enumerated in such a way that the next candidate is most likely to be valid, given that the previous trials were not. So if the valid solutions are likely to be "clustered" in some sense, then each new candidate should be as far as possible from the previous ones, in that same sense. The converse holds, of course, if the solutions are likely to be spread out more uniformly than expected by chance. Alternatives to brute force search There are many other search methods, or metaheuristics, which are designed to take advantage of various kinds of partial knowledge one may have about the solution. Heuristics can also be used to make an early cutoff of parts of the search. One example of this is the minimax principle for searching game trees, that eliminates many subtrees at an early stage in the search. In certain fields, such as language parsing, techniques such as chart parsing can exploit constraints in the problem to reduce an exponential complexity problem into a polynomial complexity problem. The search space for problems can also be reduced by replacing the full problem with a simplified version. For example, in computer chess, rather than computing the full minimax tree of all possible moves for the remainder of the game, a more limited tree of minimax possibilities is computed, with the tree being pruned at a certain number of moves, and the remainder of the tree being approximated by a static evaluation function.

Linear search
In computer science, linear search (also known as sequential search) is a search algorithm, that can be used for searching an unsorted set of data for a particular value. It operates by checking every element of a list one at a time in sequence until a match is found. Linear search runs in O(n). If the data are distributed randomly, the expected number of comparisons that will be necessary is:

where n is the number of elements in the list and k is the number of times that the value being searched for appears in the list. The best case is that the value is equal to the first element tested, in which case only 1 comparison is needed. The worst case is that the value is not in the list (or it appears only once at the end of the list), in which case n comparisons are needed The simplicity of the linear search means that if just a few elements are to be searched it is less trouble than more complex methods that require preparation such as sorting the list to be searched or more complex data structures, especially when entries may be subject to frequent revision. Another possibility is when certain values are much more likely to be searched for than others and it can be arranged that such values will be amongst the first considered in the list. The following pseudocode describes the linear search technique. For each item in the list:
Check to see if the item you're looking for matches the item in the list. If it matches. Return the location where you found it (the index). If it does not match. Continue searching until you reach the end of the list. If we get here, we know the item does not exist in the list. Return -1.

In computer implementations, it is usual to search the list in order, from element 1 to N (or 0 to N - 1, if array indexing starts with zero instead of one) but a slight gain is possible by the reverse order. Suppose an array A having elements 1 to N is to be searched for a value x and if it is not found, the result is to be zero.
for i:=N:1:-1 do %Search from N down to 1. (The step is -1) if A[i] = x then QuitLoop i; next i; Return(i); %Or otherwise employ the value.

Implementations of the loop must compare the index value i to the final value to decide whether to continue or terminate the loop. If this final value is some variable such N then a subtraction (i - N) must be done each time, but in going down from N the loop termination condition is for a constant, and moreover a special constant. In this case, zero. Most computer hardware allows the sign to be tested, especially the sign of a value in a register, and so execution would be faster. In the case where the loop was for arrays indexed from zero, the loop would be for i:=N - 1:0:-1 do and the test on the index variable would be for it negative, not zero. The pseudocode as written relies on the value of the index variable being available when the for-loop's iteration is exhausted, as being the value it had when the loop condition failed, or a 'QuitLoop' was executed. Some compilers take the position that on exit from a for-loop no such value is defined, in which case it would be necessary to copy the index 8

variable's value to a reporting variable before exiting the loop, or to use another control structure such as a while loop, or else explicit code with go to statements in pursuit of the fastest-possible execution. The following code example for the Java programming language is a simple implementation of a linear search.
public int linearSearch(int a[], int valueToFind) { //a[] is an array of integers to search. //valueToFind is the number that will be found. //The function returns the position of the value if found. //The function returns -1 if valueToFind was not found. for (int i=0; i<a.length; i++) { if (valueToFind == a[i]) { return i; } } return -1; }

Linear search can be used to search an unordered list. Unordered lists are easy to implement as arrays or linked lists, and insertion and deletion can be done in constant time. The simplicity of a linearly searched unordered list means it is often the first method chosen when implementing lists which change in size while an application runs. If this later proves to be a bottleneck it can be replaced with a more complicated scheme. The average performance of linear search can be improved by using it on an ordered list. In the case of no matching element, the search can terminate at the first element which is greater (lesser) than the unmatched target element, rather than examining the entire list. An ordered list is a more efficient data structure in general for searching than an unordered list. A binary search can often be used with an ordered list instead of a linear search. It is more difficult to implement correctly but examines much less than the entire list to determine presence or absence of an element. As the number of elements in the list grows or as the number of searches increases, the more desirable something other than a linear search becomes. Another common method is to build up a hash table and then do hash lookups. See also

Binary search Ternary search Hash table

References

Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 6.1: Sequential Searching, pp.396408.

External links

C++ Program - Linear Search C Program - Linear Search

Binary search
A binary search algorithm (or binary chop) is a technique for finding a particular value in already sorted (ordered) list. It makes progressively better guesses, and closes in on the sought value by selecting the median element in a list, comparing its value to the target value, and determining if the selected value is greater than, less than, or equal to the target value. A guess that turns out to be too high becomes the new top of the list, and a guess that is too low becomes the new bottom of the list. Pursuing this strategy to find the target value, it narrows the search diapason in half on each iteration. A binary search is an example of a divide and conquer search algorithm.

The algorithm The most common application of binary search is to find a specific value in a sorted list. For example, common task is to find the index, or numbered place, of the value in the list. This is useful because, given the index, other data structures will contain associated information. Suppose a data structure containing the classic collection of name, address, telephone number and so forth has been accumulated, and an array is prepared containing the names, numbered from 1 to N. A query might be: what is the telephone number for a given name X. To answer this the array would be searched and the index (if any) corresponding to that name determined. Appropriate provision must be made for the name not being in the list (typically by returning an index value of zero). If the list of names is in sorted order, a binary search will find a given name with far fewer probes than the simple procedure of probing each name in the list, one after the other in a linear search, and the procedure is much simpler than organizing a hash table though that would be faster still, typically averaging just over one probe. This applies for a uniform distribution of search items but if it is known that some few items are much more likely to be sought for than the majority then a linear search with the list ordered so that the most popular items are first may do better. The binary search begins by comparing the sought value X to the value in the middle of the list; because the values are sorted, it is clear whether the sought value would belong before or after that middle value, and the search then continues through the correct half in the same way. Only the sign of the difference is inspected: there is no attempt at an interpolation search based on the size of the differences. The most straightforward implementation is recursive, which recursively searches the diapason defined after the comparison:
BinarySearch(A[0..N-1], value, low, high) { if (high < low) return -1 // not found mid = (low + high) / 2 if (A[mid] > value) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) else return mid // found }

It is invoked with initial low and high values of 0 and N-1. We can eliminate the tail recursion above and convert this to an iterative implementation:
BinarySearch(A[0..N-1], value) { low = 0 high = N - 1 while (low <= high) { mid = (low + high) / 2 if (A[mid] > value) high = mid - 1

10

else if (A[mid] < value) low = mid + 1 else return mid // found } return -1 // not found }

Some implementations may not include the early termination branch, preferring to check at the end if the value was found, shown below. Checking to see if the value was found during the search (as opposed to at the end of the search) may seem a good idea, but there are extra computations involved in each iteration of the search. Also, with an array of length N using the low and high indices, the probability of actually finding the value on the first iteration is 1/N, and the probability of finding it later on (before the end) is the about 1/(high - low). The following checks for the value at the end of the search:
low = 0 high = N while (low < high) { mid = (low + high)/2; if (A[mid] < value) low = mid + 1; else // if A[mid] >= value // explanation why high can't be = mid-1: // in this condition A[mid] >= value, // if A[mid] == value and high can't be < mid, // then high can't be = mid-1 high = mid; } if (low < N) and (A[low] == value) return low // found else return -1 // not found

This algorithm has two other advantages. At the end of the loop, low points to the first entry greater than or equal to value, so a new entry can be inserted if no match is found. Moreover, it only requires one comparison; which could be significant for complex keys in languages which do not allow the result of a comparison to be saved. In practice, a three-way comparison used more often than two comparisons per loop. Also, programming implementations using fixed-width integers with modular arithmetic need to account for the possibility of overflow. One frequently-used technique for this is to compute mid, so that two smaller numbers are ultimately added: mid = low + ((high - low) / 2)

Equal elements The elements of the list are not necessarily all unique. If one searches for a value that occurs multiple times in the list, the index returned will be of the first-encountered equal element, and this will not necessarily be that of the first, last, or middle element of the run of equal-key elements but will depend on the positions of the values. Modifying the list even in seemingly unrelated ways such as adding elements elsewhere in the list may change the result. To find all equal elements an upward and downward linear search can be carried out from the initial result, stopping each search when the element is no longer equal. Thus, e.g. in a table of cities sorted by country, we can find all cities in a given country.

11

Sort key A list of pairs (p,q) can be sorted based on just p. Then the comparisons in the algorithm need only consider the values of p, not those of q. For example, in a table of cities sorted on a column "country" we can find cities in Germany by comparing country names with "Germany", instead of comparing whole rows. Such partial content is called a sort key. Correctness and testing Binary search is one of the trickiest "simple" algorithms to program correctly. When Jon Bentley assigned it as a problem in a course for professional programmers, he found that an astounding 90 percent failed to code a binary search correctly after several hours of working on it, and another study shows that accurate code for it is only found in five out of twenty textbooks. Given this insight, it is important to remember that the best way to verify the correctness of a binary search algorithm is to thoroughly test it on a computer. It is difficult to visually analyze the code without making a mistake. Performance Binary search is a logarithmic algorithm and executes in O(log2 N) time. Specifically, 1 + log2N iterations are needed to return an answer. In most cases it is considerably faster than a linear search. It can be implemented using recursion or iteration, as shown above. In some languages it is more elegantly expressed recursively; however, in some C-based languages tail recursion is not eliminated and the recursive version requires more stack space. Binary search can interact poorly with the memory hierarchy (i.e. caching), because of its random-access nature. For in-memory searching, if the interval to be searched is small, a linear search may have superior performance simply because it exhibits better locality of reference. For external searching, care must be taken or each of the first several probes will lead to a disk seek. A common technique is to substitute binary searching by linear searching as soon as the size of the remaining interval falls below a small value such as 8 or 16. When multiple binary searches are to be performed with the same key in related lists, fractional cascading can be used to speed up successive searches after the first one.

Examples An example of binary search in action is a simple guessing game in which a player has to guess a positive integer, between 1 and N, selected by another player, using only questions answered with yes or no. Supposing N is 16 and the number 11 is selected, the game might proceed as follows. Is Is Is Is the the the the number number number number greater greater greater greater than than than than 8? (Yes). 12? (No) 10? (Yes) 11? (No)

Therefore, the number must be 11. At each step, we choose a number right in the middle of the range of possible values for the number. For example, once we know the number is greater than 8, but less than or equal to 12, we know to choose a number in the middle of the range [9, 12] (in this case 10 is optimal). At most log2N questions are required to determine the number, since each question halves the search space. Note that one less question (iteration) is required than for the general algorithm, since the number is constrained to a particular range. Even if the number we're guessing can be arbitrarily large, in which case there is no upper bound N, we can still find the number in at most 2 log2k steps (where k is the 12

(unknown) selected number) by first finding an upper bound by repeated doubling. For example, if the number were 11, we could use the following sequence of guesses to find it: Is Is Is Is Is the the the the the number number number number number greater greater greater greater greater than than than than than 1? (Yes) 2? (Yes) 4? (Yes) 8? (Yes) 16? (No, N=16, proceed as above)

( We know the number is greater than 8 ) Is the number greater than 12? (No) Is the number greater than 10? (Yes) Is the number greater than 11? (No) As one simple example, in revision control systems, it is possible to use a binary search to see in which revision a piece of content was added to a file. We simply do a binary search through the entire version history; if the content is not present in a particular version, it appeared later, while if it is present it appeared at that version or sooner. This is far quicker than checking every difference. There are many occasions unrelated to computers when a binary search is the quickest way to isolate a solution we seek. In troubleshooting a single problem with many possible causes, we can change half the suspects, see if the problem remains and deduce in which half the culprit is; change half the remaining suspects, and so on. See: Shotgun debugging. People typically use a mixture of the binary search and interpolative search algorithms when searching a telephone book, after the initial guess we exploit the fact that the entries are sorted and can rapidly find the required entry. For example when searching for Smith, if Rogers and Thomas have been found, one can flip to the page halfway between the previous guesses, if this shows Samson, we know that Smith is somewhere between the Samson and Thomas pages so we can bisect these.

Language support Many standard libraries provide a way to do binary search. C provides bsearch in its standard library. C++'s STL provides algorithm functions binary_search, lower_bound and upper_bound. Java offers a set of overloaded binarySearch() static methods in the classes Arrays and Collections for performing binary searches on Java arrays and Lists, respectively. They must be arrays of primitives, or the arrays or Lists must be of a type that implements the Comparable interface, or you must specify a custom Comparator object. Microsoft's .NET Framework 2.0 offers static generic versions of the Binary Search algorithm in its collection base classes. An example would be System.Array's method BinarySearch<T>(T[] array, T value). Python provides the bisect module. COBOL can perform binary search on internal tables using the SEARCH ALL statement. External links

NIST Dictionary of Algorithms and Data Structures: binary search Sparknotes: Binary search. Simplified overview of binary search. Binary Search Implementation in Visual Basic .NET (partially in English) msdn2.microsoft.com/en-us/library/2cy9f6wb.aspx .NET Framework Class Library Array.BinarySearch Generic Method (T[], T) Implementations of binary search on LiteratePrograms. Explained and commented Binary search algorithm in C++ Binary Search using C++ 13

Interpolation search
Interpolation search is an algorithm for searching for a given key value in an indexed array that has been ordered by the values of the key. It parallels how humans search through a telephone book for a particular name, the key value by which the book's entries are ordered. In each search step it calculates where in the remaining search space the sought item might be based on the key values at the bounds of the search space and the value of the sought key, usually via a linear interpolation. The key value actually found at this estimated position is then compared to the key value being sought. If it is not equal, then depending on the comparison, the remaining search space is reduced to the part before or after the estimated position. Only if calculations on the size of differences between key values are sensible will this method work. Humans might accept that "Baker" is closer to "Abadeeah" than it is to "Zyteca" and interpolate accordingly but computer representations of text strings do not facilitate such arithmetic. By comparison, the binary search chooses always the middle of the remaining search space, discarding one half or the other, again depending on the comparison between the key value found at the estimated position and the key value sought. The remaining search space is reduced to the part before or after the estimated position. The linear search uses equality only as it compares elements one-by-one from the start, ignoring any sorting. On average the interpolation search makes about O(log (log n)) comparisons (if the elements are uniformly distributed), where n is the number of elements to be searched. In the worst case (for instance where the numerical values of the keys increase exponentially) it can make up to O(n) comparisons. Perl, Itai, and Avni note that "When the difference between the indices of two successive iterations is small, it may be advantageous to switch to sequential search ..." Unless the data is very uniform, there are millions of records, or comparisons are very timeconsuming, a binary search may be no slower: interpolation is usually time-consuming on a computer and a binary search only takes log(n) comparisons anyway. In reality, an interpolation search is often no faster than a binary search due to the complexity of the arithmetic calculations for estimating the next probe position. The key may not be easily regarded as a number, as when it is text such as names in a telephone book. Yet it is always possible to regard a bit pattern as a number, and text could be considered a number in base twenty-six (or however many distinct symbols are allowable) that could then be used in the interpolation arithmetic, but each such conversion is more trouble than a few binary chop probes. Nevertheless, a suitable numeric value could be pre-computed for every name and searching would employ those numerical values. Further, it may be easy enough to form a cumulative histogram of those values that follows a shape that can be approximated by a simple function (such as a straight line) or a few such curves over segments of the range. If so, then a suitable inverse interpolation is easily computed and the desired index found with very few probes. The particular advantage of this is that a probe of a particular index's value may involve access to slow storage (on disc, or not in the on-chip memory) whereas the interpolation calculation involves a small local collection of data which with many searches being performed is likely in fast storage. This approach shades into being a special-case Hash table search. Such analysis should also detect the troublesome case of equal key values. If a run of equal key values exists, then the search will not necessarily select the first (or last) element of such a run, and if not carefully written, runs the risk of attempting a divide by zero during its interpolation calculation. In the simple case, no analysis of values still less cumulative histogram approximations are prepared. With a sorted list, clearly the minimum value is at index one, and the maximum value is at index n. Assuming a uniform distribution of values amounts to assuming that the cumulative histogram's shape is a straight line from the minimum to the maximum and so linear interpolation will find that index whose associated value should be the sought value.

14

The following code example for the Java programming language is a simple implementation. At each stage it computes a probe position then as with the binary search, moves either the upper or lower bound in to define a smaller interval containing the sought value. Unlike the binary search which guarantees a halving of the interval's size with each stage, a misled interpolation may reduce it by only one, thus the O(n) worst case.
public int interpolationSearch(int[] sortedArray, int toFind){ // Returns index of toFind in sortedArray, or -1 if not found int low = 0; int high = sortedArray.length - 1; int mid; while (sortedArray[low] < toFind && sortedArray[high] >= toFind) { mid = low + ((toFind - sortedArray[low]) * (high - low)) / (sortedArray[high] sortedArray[low]); if (sortedArray[mid] < toFind) low = mid + 1; //Repetition of the comparison code is forced by syntax limitations: else if (sortedArray[mid] > toFind) high = mid - 1; else return mid; } if (sortedArray[low] == toFind) return low; else return -1; // Not found }

Notice that having probed the list at index mid, for reasons of loop control administration, this code sets either high or low to be not mid but an adjacent index, which location is then probed during the next iteration. Since an adjacent entry's value will not be much different the interpolation calculation is not much improved by this one step adjustment, at the cost of an additional reference to distant memory such as disc. Each iteration of the above code requires between five and six comparisons (the extra is due to the repetitions needed to distinguish the three states of < > and = via binary comparisons in the absence of a three-way comparison) plus some messy arithmetic, while the binary search algorithm can be written with one comparison per iteration and uses only trivial integer arithmetic. It would thereby search an array of a million elements with no more than twenty comparisons (involving accesses to slow memory where the array elements are stored); to beat that the interpolation search as written above would be allowed no more than three iterations. Book-based searching The conversion of names in a telephone book to some sort of number clearly will not provide numbers having a uniform distribution (except via immense effort such as sorting the names and calling them name #1, name #2, etc.) and further, it is well-known that some names are much more common than others (Smith, Jones,) Similarly with dictionaries, where there are many more words starting with some letters than others. Some publishers go to the effort of preparing marginal annotations or even cutting into the side of the pages to show markers for each letter so that at a glance a segmented interpolation can be performed. External links

National Institute of Standards and Technology

15

Tree traversal
The tree traversal refers to the process of going through each node in a tree in a systematic way when creating, enumerating nodes or searching the tree. Such traversals are classified by the order in which you get to the nodes. Traversal methods Tree structures can be traversed in different ways. Starting at the root of a binary tree, there are three main steps and the order in which they are performed define the traversal type. These operations are repeated, and thus most easily coded using recursion (see code sample in Appendix).

To traverse a non-empty binary tree in preorder (also called depth-first traversal): 1. Visit the node (for example, print the node's data). 2. Traverse the left subtree. 3. Traverse the right subtree. To traverse a non-empty binary tree in inorder: 1. Traverse the left subtree. 2. Visit the node. 3. Traverse the right subtree. To traverse a non-empty binary tree in postorder: 1. Traverse the left subtree. 2. Traverse the right subtree. 3. Visit the node. Trees can also be traversed in level-order (also called breadth-first traversal), where you go through every node on a level before going to a lower level.

Example:

Preorder traversal sequence: F, B, A, D, C, E, G, I, H Inorder traversal sequence: A, B, C, D, E, F, G, H, I Note that the inorder traversal of this binary search tree yields an ordered list. Postorder traversal sequence: A, C, E, D, B, H, I, G, F Level-order traversal sequence: F, B, G, A, D, I, C, E, H

16

Sample implementations preorder(node) print node.value if node.left null then preorder(node.left) if node.right null then preorder(node.right) inorder(node) if node.left null then inorder(node.left) print node.value if node.right null then inorder(node.right) postorder(node) if node.left null then postorder(node.left) if node.right null then postorder(node.right) print node.value All three sample implementations will require stack space proportional to the height of the tree. In a poorly balanced tree, this can be quite considerable. We can remove the stack requirement by threading the tree. See threaded binary trees for more information.

Uses Inorder traversal It is particularly common to use an inorder traversal on a binary tree because this will return values from the underlying set in order, according to the comparison criteria that set up the binary tree (or binary search tree - hence the name). To see why this is the case, note that if n is a node in a binary tree, then everything in n 's left subtree is less than n, and everything in n 's right subtree is greater than or equal to n. Thus, if we visit the left subtree in order, using a recursive call, and then visit n, and then visit the right subtree in order, we have visited the entire subtree rooted at n in order. Traversing in reverse inorder similarly gives the values in decreasing order.

Iterative traversing All the above recursive algorithms require stack space proportional to the depth of the tree. Recursive traversal may be converted into an iterative one using various wellknown methods.

17

Tree search Search in a binary search tree always starts from the root node, and goes through the nodes of one of the subtree selecting right or left link depending on the comparison with the searched item until the node is found.
//Definitions of used classes: class CNode { private: CNode *m_pRight;//tree to the m_pRight side CNode *m_pLeft;//tree to the left side public: string m_sWord;//word for a tree friend class CTree; }; class CTree { private: CNode *m_pRoot;//the top of the tree //insert a new node: void m_InsertWord (CNode *&m_rpNewNode, const string& m_sWord); //print a single node void m_PrintWord(CNode *m_pTop); //find a node CNode* FindWord(CNode* m_pCurrent,const string& Item); public: CTree() {m_pRoot=NULL;} //string& Info; //add a new word to a tree: void m_Insert(string& m_sWord){ m_InsertWord(m_pRoot, m_sWord); } //print the tree: void m_Print(){ m_PrintWord(m_pRoot); } void m_Find(string& Item){ FindWord(m_pRoot, Item); } }; static CTree CInstanceWords;//list of words } //Function to find a certain word in a tree structure: CNode* CTree::FindWord(CNode* m_pCurrent,const string& sItem) { //Preorder traversal: if (m_pCurrent == NULL) { printf ("\nWord is not found.\n"); return NULL; } else if (sItem == m_pCurrent->m_sWord) { printf ("\nWord \""); cout<<m_pCurrent->m_sWord; printf ("\" is found at: %d\n", m_pCurrent); return m_pCurrent; } else if (sItem < m_pCurrent->m_sWord) return FindWord(m_pCurrent->m_pLeft, sItem); else return FindWord(m_pCurrent->m_pRight, sItem); }

18

See also

Tree programming Binary search tree description from NIST.

References

Dale, Nell. Lilly, Susan D. "Pascal Plus Data Structures". D. C. Heath and Company. Lexington, MA. 1995. Fourth Edition. Drozdek, Adam. "Data Structures and Algorithms in C++". Brook/Cole. Pacific Grove, CA. 2001. Second edition. http://www.math.northwestern.edu/~mlerma/courses/cs310-05s/notes/dmtreetran

19

Sorting algorithms
Comparison based sorting Exchange sorting Bubble sort Insert and keep sorted Insertion sort Tree sort Priority queue sorting Selection sort Heap sort Divide and conquer Quicksort Merge sort Diminishing increment sorting Shell sort Address based sorting Proxmap sort Radix sort

Theory Exchange sorts Selection sorts Insertion sorts Merge sorts Non-comparison sorts Others

Computational complexity theory | Big O notation | Total order | Lists | Stability | Comparison sort Bubble sort | Cocktail sort | Comb sort | Gnome sort | Quicksort Selection sort | Heapsort | Smoothsort | Strand sort Insertion sort | Shell sort | Tree sort | Library sort | Patience sorting Merge sort Radix sort | Bucket sort | Counting sort | Pigeonhole sort | Tally sort Topological sorting | Sorting network

20

Sorting algorithms In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important to optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also used for structuring data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. the output is in ascending or descending order; 2. the output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order) - binary tree; 3. the output is a permutation, or reordering, of the input. The sorting problem has always attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956. Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide-andconquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs. Classification Sorting algorithms used in computer science are often classified by:

Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list (n). For typical sorting algorithms good behavior is O(n log n) and bad behavior is O(n). Ideal behavior for a sort is O(n). Sort algorithms which only use an abstract key comparison operation always need O(n log n) comparisons in the worst case. Computational complexity of swaps (for "in place" algorithms). Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in place", such that only O(1) or O(log n) memory is needed beyond the items being sorted, while others need to create auxiliary locations for data to be temporarily stored. Recursion. Some algorithms are either recursive or non recursive, while others may be both (e.g., merge sort). Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e., values). Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator. General method: insertion, exchange, selection, merging, etc. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort.

21

Stability Stable sorting algorithms maintain the relative order of records with equal keys (i.e., sort key values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list. When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. However, assume that the following pairs of numbers are to be sorted by their first component: (4, 1) (3, 7) (3, 1) (5, 6)

In this case, two different results are possible, one which maintains the relative order of records with equal keys, and one which does not: (3, 7) (3, 1) (3, 1) (3, 7) (4, 1) (4, 1) (5, 6) (5, 6) (order maintained) (order changed)

Unstable sorting algorithms may change the relative order of records with equal keys, but stable sorting algorithms never do so. Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original data order as a tie-breaker. Remembering this order, however, often involves an additional space cost. Sorting based on a primary, secondary, tertiary, etc. sort key can be done by any sorting method, taking all sort keys into account in comparisons (in other words, using a single composite sort key). If a sorting method is stable, it is also possible to sort multiple times, each time with one sort key. In that case the keys need to be applied in order of increasing priority.

Comparison of algorithms
In this table, n is the number of records to be sorted. The columns "Average" and "Worst" give the time complexity in each case, under the assumption that the length of each key is constant, and that therefore all comparisons, swaps, and other needed operations can proceed in constant time. "Memory overhead" denotes the amount of auxiliary storage needed beyond that used by the list itself, under the same assumption. These are all comparison sorts.
Name Bubble sort Cocktail sort Comb sort Gnome sort Selection sort Insertion sort Shell sort Binary tree sort Library sort Merge sort Average case Worst case Memory Stable overhead Method Exchanging Exchanging Exchanging Exchanging Selection Insertion Insertion Insertion Insertion Merging Merging Selection Selection See example implementation in [1] When using a self-balancing binary search tree Can be implemented as a stable sort d is the number of inversions, which is O(n) Small code size Other notes

O(n) O(n + d)

O(n) O(n) O(n) O(n) O(n) O(n log n)

O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(n) O(n) O(n) O(1) O(1) O(1)

Yes Yes No Yes No Yes No Yes Yes Yes Yes No No

O(n log n) O(n log n)

O(n log n) O(n log n) O(n log n) O(n) O(n log n) O(n log n) O(n log n) O(n log n) O(n log n) O(n log n)

In-place merge sort O(n log n) Heapsort Smoothsort

22

Quicksort Introsort Patience sorting Strand sort

O(n log n) O(n) O(n log n) O(n log n) O(n)

O(log n) O(log n) O(n) O(n)

No No No Yes

Partitioning Hybrid Insertion Selection

Can be O(n log n) worst case if median pivot is used. Used in most implementations of STL Finds all the longest increasing subsequences within O(n log n)

O(n log n) O(n)

The following table describes sorting algorithms that are not comparison sorts. As such, they are not limited by a O(n log n) lower bound. Complexities below are in terms of n, the number of items to be sorted, k, the size of each key, and s, the memory chunk size used by the implementation. Many of them are based on the assumption that the key k size is large enough that all entries have unique key values, and hence that n << 2 , where << means "much less than".
Name Pigeonhole sort Bucket sort Counting sort LSD Radix sort MSD Radix sort Average case Worst case Memory overhead Stable Yes Yes Yes Yes
s

n << 2k? Yes No Yes No No

Notes

O(n+2 ) O(nk) O(n+2k) O(nk/s) O(nk/s)

O(n+2 ) O(nk) O(n+2k) O(nk/s) O(n(k/s)2 )


s

O(2 ) O(nk) O(n+2k) O(n) O((k/s)2 )

Assumes uniform distribution of elements from the domain in the array.

No

Spreadsort

O(nk/log(n)) O(n(k - log(n)).5) O(n)

No

No

Asymptotics are based on the assumption that n << 2k, but the algorithm does not require this.

Often used methods


Bubble sort Bubble sort is a straightforward and simplistic method of sorting data that is used in computer science education. The algorithm starts at the beginning of the data set. It compares the first two elements, and if the first is greater than the second, it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then starts again with the first two elements, repeating until no swaps have occurred on the last pass. While simple, this algorithm is highly inefficient and is rarely used except in education. A slightly better variant, cocktail sort, works by inverting the ordering criteria and the pass direction on alternating passes. Its average case and worst case are both O(n).

Selection sort Selection sort is a simple sorting algorithm that improves on the performance of bubble sort. It works by first finding the smallest element using a linear scan and swapping it into the first position in the list, then finding the second smallest element by scanning the remaining elements, and so on. Selection sort is unique compared to almost any other algorithm in that its running time is not affected by the prior ordering of the list: it performs the same number of operations because of its simple structure. Selection sort requires (n - 1) swaps and hence O(n) memory writes. However, selection sort requires (n - 1) + (n - 2) + ... + 2 + 1 = n(n - 1) / 2 = O(n2) comparisons. Thus it can be very attractive if writes are the most expensive operation, but otherwise selection sort will usually be outperformed by insertion sort or the more complicated algorithms.

23

Insertion sort Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. In arrays, the new list and the remaining elements can share the array's space, but insertion is expensive, requiring shifting all following elements over by one. The insertion sort works just like its name suggests - it inserts each item into its proper place in the final list. The simplest implementation of this requires two list structures - the source list and the list into which sorted items are inserted. To save memory, most implementations use an in-place sort that works by moving the current item past the already sorted items and repeatedly swapping it with the preceding item until it is in place. Shell sort is a variant of insertion sort that is more efficient for larger lists. This method is much more efficient than the bubble sort, though it has more constraints. Shell sort Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time. One implementation can be described as arranging the data sequence in a two-dimensional array and then sorting the columns of the array using insertion sort. Although this method is inefficient for large data sets, it is one of the fastest algorithms for sorting small numbers of elements (sets with less than 1000 or so elements). Another advantage of this algorithm is that it requires relatively small amounts of memory. Merge sort Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Heapsort Heapsort is a much more efficient version of selection sort. It also works by determining the largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then continuing with the rest of the list, but accomplishes this task efficiently by using a data structure called a heap, a special type of binary tree. Once the data list has been made into a heap, the root node is guaranteed to be the largest element. When it is removed and placed at the end of the list, the heap is rearranged so the largest element remaining moves to the root. Using the heap, finding the next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This allows Heapsort to run in O(n log n) time. Quicksort Quicksort is a "divide and conquer" algorithm which relies on a partition operation: to partition an array, we choose a base element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in 24

quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n) performance, but if at each step we choose the median as the pivot then it works in O(n log n). Bucket sort Bucket sort is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. A variation of this method called the single buffered count sort is faster than the quick sort and takes about the same time to run on any set of data. Radix sort Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n k) time by treating them as bit strings. We first sort the list by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the number of values a bit can have is small.

Memory usage
When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical. In this scenario, the total number of comparisons becomes (relatively) less important, and the number of times sections of memory must be copied or swapped to and from the disk can dominate the performance characteristics of an algorithm. Thus, the number of passes and the localization of comparisons can be more important than the raw number of comparisons, since comparisons of nearby elements to one another happen at system bus speed (or, with caching, even at CPU speed), which, compared to disk speed, is virtually instantaneous. For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons. One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate). Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort". Another technique for overcoming the memory-size problem is to combine two algorithms in a way that takes advantages of the strength of each to improve overall performance. For instance, the array might be subdivided into chunks of a size that will fit easily in RAM (say, a few thousand elements), the chunks sorted using an efficient algorithm (such as quicksort or heapsort), and the results merged as per mergesort. This is less efficient than just doing mergesort in the first place, but it requires less physical RAM (to be practical) than a full quicksort on the whole array. Techniques can also be combined. For sorting very large sets of data that vastly exceed system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required. 25

See also

Big O notation External sorting Sorting networks (compare) Collation Schwartzian transform Shuffling algorithms Search algorithms Wikibooks: Algorithms: Uses sorting a deck of cards with many sorting algorithms as an example

Notes and references 1. ^ http://www.cs.duke.edu/~ola/papers/bubble.pdf 2. ^ Y. Han. Deterministic sorting in O(nlog log n) time and linear space. Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, Montreal, Quebec, Canada, 2002,p.602-608. 3. ^ M. Thorup. Randomized Sorting in O(n log log n) Time and Linear Space Using Addition, Shift, and Bit-wise Boolean Operations. Journal of Algorithms, Volume 42, Number 2, February 2002 , pp. 205-230(26) 4. ^ Y. Han, M. Thorup, Integer Sorting in O(n (log log n) Time and Linear Space. Proceedings of the 43rd Symposium on Foundations of Computer Science, 2002, p. 135-144. 5. ^ tag sort Definition 6. D. E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching. External links

Sequential and parallel sorting algorithms - has explanations and analyses of many of these algorithms. 'Dictionary of Algorithms, Data Structures, and Problems' Slightly Skeptical View on Sorting Algorithms Softpanorama page that discusses several classic algorithms and promotes alternatives to quicksort. Sorting small arrays C++ code and analysis looking at the best (fastest) way to sort small (ie < 64) items, including special attention on sorting networks. For a repository of algorithms with source code and lectures, see The Stony Brook Algorithm Repository Graphical Java illustrations of the Bubble sort, Insertion sort, Quicksort, and Selection sort xSortLab - An interactive Java demonstration of Bubble, Insertion, Quick, Select and Merge sorts, which displays the data as a bar graph with commentary on the workings of the algorithm printed below the graph. Sorting contest - An applet visually demonstrating a contest between a number of different sorting algorithms The Three Dimensional Bubble Sort- A method of sorting in three or more dimensions (of questionable merit) Sort huge amounts of data by doing a multi-phase sorting on temporary file AniSort - Java applet visualizing 6 different sorting algorithms Naturalorder - Natural Order Numerical Sorting Pointers to sorting visualizations Extensive collection of animated sorting algorithms with source code. OPL booklet of the main sorting algorithms by Michael Lamont QiSort - A new O(n log n) algorithm for 2007 Sorting algorithm demonstrations with source code "tracing"

26

Comparison based sorting


Bubble sort Bubble sort is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing two items at a time and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which means the list is sorted. The algorithm gets its name from the way smaller elements "bubble" to the top (i.e. the beginning) of the list via the swaps. Because it only uses comparisons to operate on elements, it is a comparison sort. This is the easiest comparison sort to implement.

A simple way to express bubble sort in pseudocode is as follows: procedure bubbleSort( A : list of sortable items ) defined as: do swapped := false for each i in 0 to length( A ) - 2 do: if A[ i ] > A[ i + 1 ] then swap( A[ i ], A[ i + 1 ] ) swapped := true end if end for while swapped end procedure The algorithm can also be expressed as: procedure bubbleSort( A : list of sortable items ) defined as: for each i in 1 to length(A) do: for each j in length(A) downto i + 1 do: if A[ j ] < A[ j - 1 ] then swap( A[ j ], A[ j - 1 ] ) end if end for end for end procedure

27

Best-case performance Bubble sort has best-case complexity O(n). When a list is already sorted, bubblesort will pass through the list once, and find that it does not need to swap any elements. Thus bubblesort will make only n comparisons and determine that list is completely sorted. It will also use considerably less time than (n) if the elements in the unsorted list are not too far from their sorted places.

Variations

Odd-even sort is a parallel version of bubble sort, for message passing systems. Example of odd-even transposition sort sorting a list of random numbers.

References Sorting in the Presence of Branch Prediction and Caches External links C++ Program - Bubble Sort (with an explanation) Bubble Sort video and C++ Code Practical demonstration

28

Quicksort

Quicksort is a well-known sorting algorithm developed by C.A.R. Hoare in 1962 that, on average, makes O(n log n) comparisons to sort n items. However, in the worst case, it 2 makes O(n ) comparisons. Typically, quicksort is significantly faster in practice than other O(n log n) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data it is possible to make design choices which minimize the possibility of requiring quadratic time. As its name implies, Quicksort is the fastest known sorting algorithm in practice (address-based sorts can be faster). Quicksort is a comparison sort and is not a stable sort.

The algorithm Quicksort sorts by employing a divide and conquer strategy to divide a list into two sublists. The steps are: 1. Pick an element, called a pivot, from the list. 2. Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation. 3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements. The base case of the recursion are lists of size zero or one, which are always sorted. The algorithm always terminates because it puts at least one element in its final place on each iteration (the loop invariant).

In simple pseudocode, the algorithm might be expressed as: function quicksort(q) var list less, pivotList, greater if length(q) 1 return q select a pivot value pivot from q for each x in q if x < pivot then add x to less if x = pivot then add x to pivotList if x > pivot then add x to greater return concatenate(quicksort(less), pivotList, quicksort(greater)) Notice that we only examine elements by comparing them to other elements. This makes quicksort a comparison sort.

29

Version with in-place partition

In-place partition on a small list when to the right of the pivot 5 are the greater numbers and to the left are the lesser ones. The boxed element is the pivot element, blue elements are less or equal, and red elements are larger. The disadvantage of the simple version above is that it requires O(n) extra storage space, which is as bad as mergesort. The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complicated version which uses an in-place partition algorithm and can achieve O(log n) space use on average for good pivot choices: function partition(array, left, right, pivotIndex) pivotValue := array[pivotIndex] swap( array, pivotIndex, right) // Move pivot to end storeIndex := left for i from left to right-1 if array[i] <= pivotValue swap( array, storeIndex, i) storeIndex := storeIndex + 1 if array[right] <= array[storeIndex] swap( array, right, storeIndex) // Move pivot to its final place return storeIndex else return right This form of the partition algorithm is not the original form; multiple variations can be found in various textbooks, such as versions not having the storeIndex. However, this form is probably the easiest to understand. This is the in-place partition algorithm. It partitions the portion of the array between indexes left and right, inclusively, by moving all elements less than or equal to a[pivotIndex] to the beginning of the subarray, leaving all the greater elements following them. In the process it also finds the final position for the pivot element, which it returns. It temporarily moves the pivot element to the end of the subarray, so that it doesn't get in the way. Because it only uses exchanges, the final list has the same elements as the original list. Notice that an element may be exchanged multiple times before reaching its final place. Once we have this, writing quicksort itself is easy: function quicksort(array, left, right) if right > left select a pivot index (e.g. pivotIndex = left) pivotNewIndex := partition(array, left, right, pivotIndex) quicksort(array, left, pivotNewIndex-1) 30

quicksort(array, pivotNewIndex+1, right)

In-place partition on a small list when to the right of the pivot number 10 are the greater numbers and to the left are the lesser ones. Randomized quicksort expected complexity Randomized quicksort has the desirable property that it requires only O(n log n) expected time, regardless of the input. But what makes random pivots a good choice? Suppose we sort the list and then divide it into four parts. The two parts in the middle will contain the best pivots; each of them is larger than at least 25% of the elements and smaller than at least 25% of the elements. If we could consistently choose an element from these two middle parts, we would only have to split the list at most 2log2 n times before reaching lists of size 1, yielding an O(n log n) algorithm.

Relationship to selection A selection algorithm chooses the kth smallest of a list of numbers; this is an easier problem in general than sorting. One simple but effective selection algorithm works nearly in the same manner as quicksort, except that instead of making recursive calls on both sublists, it only makes a single tail-recursive call on the sublist which contains the desired element. This small change lowers the average complexity to linear or O(n) time, and makes it an in-place algorithm. A variation on this algorithm brings the worst-case time down to O(n) (see selection algorithm for more information). Conversely, once we know a worst-case O(n) selection algorithm is available, we can use it to find the ideal pivot (the median) at every step of quicksort, producing a variant with worst-case O(n log n) running time. In practical implementations, however, this variant is considerably slower on average.

References Brian C. Dean, "A simple expected running time analysis for randomized 'divide and conquer' algorithms." Discrete Applied Mathematics 154(1): 1-5. 2006. Hoare, C. A. R. "Partition: Algorithm 63," "Quicksort: Algorithm 64," and "Find: Algorithm 65." Comm. ACM 4(7), 321-322, 1961 31

David Musser. Introspective Sorting and Selection Algorithms, Software Practice and Experience vol 27, number 8, pages 983-993, 1997 Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Pages 113 122 of section 5.2.2: Sorting by Exchanging.

External links

Wikibooks Algorithm implementation has a page on the topic of Quicksort Literate implementations of Quicksort in various languages on LiteratePrograms Quicksort tutorial with illustrated examples

32

Heapsort

Heapsort (method) is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case O(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort. Overview Heapsort inserts the input list elements into a heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. During extraction, the only space required is that needed to store the heap. In order to achieve constant space overhead, the heap is stored in the part of the input array that has not yet been sorted (the structure of this heap is described at "Binary heap: Heap implementation"). Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements. Comparison with other sorts Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm. Quicksort is typically somewhat faster, due to better cache behavior and other factors, 2 but the worst-case running time for quicksort is O(n ), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort. Heapsort also competes with merge sort, which has the same time bounds, but requires O(n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort: Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order. Merge sort is a stable sort. Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all. Merge sort can be easily adapted to operate on linked lists and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times. An interesting alternative to heapsort is introsort which combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort. Pseudocode The following is the "simple" way to implement the algorithm, in pseudocode, where swap is used to swap two elements of the array. Notice that the arrays are zero based in this example.

33

function heapSort(a, count) is input: an unordered array a of length count (first place a in max-heap order) heapify(a, count) end := count - 1 while end > 0 do (swap the root(maximum value) of the heap with the last element of the heap) swap(a[end], a[0]) (decrease the size of the heap by one so that the previous max value will stay in its proper placement) end := end - 1 (put the heap back in max-heap order) siftDown(a, 0, end) function heapify(a,count) is (start is assigned the index in a of the last parent node) start := (count - 1) / 2 while start 0 do (sift down the node at index start to the proper place such that all nodes below the start index are in heap order) siftDown(a, start, count-1) start := start - 1 (after sifting down the root all nodes/elements are in heap order) function siftDown(a, start, end) is input: end represents the limit of how far down the heap to sift. root := start while root * 2 + 1 end do (While the root has at least one child) child := root * 2 + 1 (root*2+1 points to the left child) (If the child has a sibling and the child's value is less than its sibling's...) if child < end and a[child] < a[child + 1] then child := child + 1 (... then point to the right child instead) if a[root] < a[child] then (out of max-heap order) swap(a[root], a[child]) root := child (repeat to continue sifting down the child now) else return

The heapify function can be thought of as building a heap from the bottom up, successively sifting downward to establish the heap property. An alternate version (shown below) that builds the heap top-down and sifts upward is conceptually simpler to grasp. This "siftUp" version can be visualized as starting with an empty heap and successively inserting elements. However, it is asymptotically slower: the "siftDown" version is O(n), and the "siftUp" version is O(n log n) in the worst case. The heapsort algorithm is O(n log n) overall using either version of heapify. function heapify(a,count) is (end is assigned the index of the first (left) child of the root) end := 1 while end < count (sift up the node at index end to the proper place such that all nodes above the end index are in heap order) siftUp(a, 0, end) end := end + 1 (after sifting up the last node all nodes are in heap order) function siftUp(a, start, end) is input: start represents the limit of how far up the heap to sift. end is the node to sift up. child := end while child > start parent := [(child - 1) 2] 34

if a[parent] < a[child] then (out of max-heap order) swap(a[parent], a[child]) child := parent (repeat to continue sifting up the parent now) else return References

J. W. J. Williams. Algorithm 232 - Heapsort, 1964, Communications of the ACM 7(6): 347348. Robert W. Floyd. Algorithm 245 - Treesort 3, 1964, Communications of the ACM 7(12): 701. Svante Carlsson, Average-case results on heapsort, 1987, BIT 27(1): 2-17. Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Pages 144 155 of section 5.2.3: Sorting by Selection. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapters 6 and 7 Respectively: Heapsort and Priority Queues A PDF of Dijkstra's original paper on Smoothsort Heaps and Heapsort Tutorial by David Carlson, St. Vincent College

35

Non-comparison sort
Counting sort Counting sort is a sorting algorithm which (like bucket sort) takes advantage of knowing the range of the numbers in the array to be sorted (array A). It uses this range to create an array C of this length. Each index i in array C is then used to count how many elements in A have the value i. The counts stored in C can then be used to put the elements in A into their right position in the resulting sorted array. It is less efficient than pigeonhole sort. Characteristics of counting sort Counting sort is a stable sort and has a running time of (n+k), where n and k are the lengths of the arrays A (the input array) and C (the counting array), respectively. In order for this algorithm to be efficient, k must not be much larger than n. The indices of C must run from the minimum to the maximum value in A to be able to index C directly with the values of A. Otherwise, the values of A will need to be translated (shifted), so that the minimum value of A matches the smallest index of C. (Translation by subtracting the minimum value of A from each element to get an index into C therefore gives a counting sort. If a more complex function is used to relate values in A to indices into C, it is a bucket sort.) If the minimum and maximum values of A are not known, an initial pass of the data will be necessary to find these (this pass will take time O(n) see selection algorithm). The length of the counting array C must be at least equal to the range of the numbers to be sorted (that is, the maximum value minus the minimum value plus 1). This makes counting sort impractical for large ranges in terms of time and memory needed. Counting sort may for example be the best algorithm for sorting numbers whose range is between 0 and 100, but it is probably unsuitable for sorting a list of names alphabetically (again, see bucket sort and pigeonhole sort). However counting sort can be used in radix sort to sort a list of numbers whose range is too large for counting sort to be suitable alone. Because counting sort uses key values as indexes into an array, it is not a comparison sort, the (n log n) lower-bound for sorting is inapplicable. The algorithm Informal description 1. Find the highest and lowest elements of the set 2. Count the different elements in the array (for example, [4,4,4,1,1] would give three 4's and two 1's) 3. Accumulate the counts (for example, starting from the first element in the new set of counts, add the current element to the previous.) 4. Fill the destination array from backwards: put each element to its countth position. Each time you put in a new element decrease its count. C++ implementation
// CountingSort.cpp : Defines the entry point for the console application. #include <iostream.h> #include <stdio.h> #include <conio.h> /// /// /// /// /// countingSort - sort an array of values. For best results the range of values to be sorted should not be significantly larger than the number of elements in the array.

36

/// /// param nums - input - array of values to be sorted /// param size - input - number of elements in the array /// void countingSort(int *nums, int size); void print(int nums[]); void main() { int nums[] = { 1, 2, 4, 6, 7, 7, 0, 6, 3, 5}; countingSort(nums, 10); print(nums); } void countingSort(int *nums, int size) { // search for the minimum and maximum values in the input int i, min = nums[0], max = min; for(i = 1; i < size; ++i) { if (nums[i] < min) min = nums[i]; else if (nums[i] > max) max = nums[i]; } // 1.Create a counting array "counts" with a member for // each possible discrete value in the input. // Initialize all counts to 0: int distinct_element_count = max - min + 1; int* counts = new int[distinct_element_count]; for(i=0; i<distinct_element_count; ++i) counts[i] = 0; // 2.Accumulate the counts - the result is that counts will hold // the offset into the sorted array for the value associated with that index for(i=0; i<size; ++i) ++counts[ nums[i] - min ]; // 3.Store the elements in the array: int j=0; for(i=min; i<=max; i++) for(int z=0; z<counts[i-min]; z++) nums[j++] = i; delete[] counts; } void print(int nums[]) { for (int i = 0; i < 10; i++) { cout << nums[i] << "-"; } cout << endl; }

References Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 8.2: Counting sort, pp.168170. Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Second Edition. Addison-Wesley, 1998. ISBN 0-201-89685-0. Section 5.2, Sorting by counting, pp.7580. Seward, Harold H. Information sorting in the application of electronic digital computers to business operations Masters thesis. MIT 1954.

37

Appendix
Optimization in search and sorting
(from MSDN, Visual C++ programmers guide, "Tips for improving time-critical code", chapter "Sorting and searching") Sorting is an inherently time consuming operation (compared to other typical operations you might have to do). The best way to speed up code that is slow due to sorting is to find a way to not sort at all.

Perhaps the sort can be deferred to a non-performancecritical time. Perhaps only part of the data truly needs to be sorted. Perhaps information was available at some other time that would have made the sort easier to do later.

You might be able to build the list in sorted order. But this might slow you down overall because inserting each new element in order could require a more complicated data structure with possibly worse locality of reference. There are no easy answers; try several approaches and measure the differences. If you find that you must sort there are a few things to remember:

If possible, use already tested sort algorithm; writing your own is a recipe for bugs. Anything you can do before the sort that simplifies what you need to do during the sort is probably a good thing. If a sort runs in O(n log n) and you can make a one-time pass over your data to simplify the comparisons such that the sort can now run in O(n), youre well ahead. Think about the locality of reference of the sort algorithm youre using versus the data that you expect it to run on.

There are fewer alternatives to searching than for sorting if you want to improve performance. If you need to do a search and if the search is time-critical, a binary search or hash table lookup is almost always the right answer. But again keep locality in mind. A linear search through a small array might be faster than a binary search through a data structure with a lot of pointers, which could cause page faults or cache misses.

38

C/C++ functions
bsearch - performs a binary search of a sorted array. void *bsearch( const void *key, const void *base, size_t num, size_t width, int ( __cdecl *compare ) ( const void *elem1, const void *elem2 ) ); Routine bsearch Required Header <stdlib.h> and <search.h> Compatibility ANSI, Win 95, Win NT

For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB LIBCMT.LIB MSVCRT.LIB Single thread static library, retail version Multithread static library, retail version Import library for MSVCRT.DLL, retail version

Return Value: bsearch returns a pointer to an occurrence of key in the array pointed to by base. If key is not found, the function returns NULL. If the array is not in ascending sort order or contains duplicate records with identical keys, the result is unpredictable. Parameters: key Object to search for base Pointer to base of search data num Number of elements width Width of elements compare Function that compares two elements: elem1 and elem2 elem1 Pointer to the key for the search elem2 Pointer to the array element to be compared with the key Remarks: The bsearch function performs a binary search of a sorted array of num elements, each of width bytes in size. The base value is a pointer to the base of the array to be searched, and key is the value being sought. The compare parameter is a pointer to a user-supplied routine that compares two array elements and returns a value specifying their relationship. bsearch calls the compare routine one or more times during the search, passing pointers to two array elements on each call. The compare routine compares the elements, then returns one of the following values:

39

Value returned by compare routine <0 0 >0

Description elem1 less than elem2 elem1 equal to elem2 elem1 greater than elem2

Note: you can search for either ASCII strings or hexadecimal bytes. For example, to find "Hello", you can search for either the string "Hello" or for "48 65 6C 6C 6F" (the hexadecimal equivalent). Example
/* BSEARCH.C: This program reads the command-line * parameters, sorting them with qsort, and then * uses bsearch to find the word "cat." */ #include <search.h> #include <string.h> #include <stdio.h> int compare( char **arg1, char **arg2 ); // Declare a function for compare void main( int argc, char **argv ) { char **result; char *key = "cat"; int i; /* Sort using Quicksort algorithm: */ qsort( (void *)argv, (size_t)argc, sizeof( char * ), \ (int (*)(const void*, const void*))compare ); for( i = 0; i < argc; ++i ) printf( "%s ", argv[i] ); /* Output sorted list */

/* Find the word "cat" using a binary search algorithm: */ result = (char **)bsearch( (char *) &key, (char *)argv, argc, sizeof( char * ), (int (*)(const void*, const void*))compare ); if( result ) printf( "\n%s found at %Fp\n", *result, result ); else printf( "\nCat not found!\n" ); } int compare( char **arg1, char **arg2 ) { /* Compare all of both strings: */ return _strcmpi( *arg1, *arg2 );

} Output
[C:\work]bsearch dog pig horse cat human rat cow goat bsearch cat cow dog goat horse human pig rat cat found at 002D0008

40

_lfind - performs a linear search for the specified key. void *_lfind( const void *key, const void *base, unsigned int *num, unsigned int width, int (__cdecl *compare)(const void *elem1, const void *elem2) ); Routine _lfind Required Header <search.h> Compatibility Win 95, Win NT

For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB LIBCMT.LIB MSVCRT.LIB Return Value If the key is found, _lfind returns a pointer to the element of the array at base that matches key. If the key is not found, _lfind returns NULL. Parameters key Object to search for base Pointer to base of search data num Number of array elements width Width of array elements compare Pointer to comparison routine elem1 Pointer to key for search elem2 Pointer to array element to be compared with key Remarks The _lfind function performs a linear search for the value key in an array of num elements, each of width bytes in size. Unlike bsearch, _lfind does not require the array to be sorted. The base argument is a pointer to the base of the array to be searched. The compare argument is a pointer to a user-supplied routine that compares two array elements and then returns a value specifying their relationship. _lfind calls the compare routine one or more times during the search, passing pointers to two array elements on each call. The compare routine must compare the elements then return nonzero, meaning the elements are different, or 0, meaning the elements are identical. Single thread static library, retail version Multithread static library, retail version Import library for MSVCRT.DLL, retail version

41

Example:
/* LFIND.C: This program uses _lfind to search for * the word "hello" in the command-line arguments. */ #include <search.h> #include <string.h> #include <stdio.h> int compare( const void *arg1, const void *arg2 ); void main( unsigned int argc, char **argv ) { char **result; char *key = "hello"; result = (char **)_lfind( &key, argv, &argc, sizeof(char *), compare ); if( result ) printf( "%s found\n", *result ); else printf( "hello not found!\n" ); } int compare(const void *arg1, const void *arg2 ) { return( _stricmp( * (char**)arg1, * (char**)arg2 ) ); }

Output
[C:\code]lfind Hello Hello found

42

_lsearch - performs a linear search for a value. Adds to end of list if not found. void *_lsearch( const void *key, void *base, unsigned int *num, unsigned int width, int (__cdecl *compare)(const void *elem1, const void *elem2) ); Routine _lsearch Required Header <search.h> Compatibility Win 95, Win NT

For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB LIBCMT.LIB MSVCRT.LIB Single thread static library, retail version Multithread static library, retail version Import library for MSVCRT.DLL, retail version

Return Value If the key is found, _lsearch returns a pointer to the element of the array at base that matches key. If the key is not found, _lsearch returns a pointer to the newly added item at the end of the array. Parameters key Object to search for base Pointer to base of array to be searched num Number of elements width Width of each array element compare Pointer to comparison routine elem1 Pointer to key for search elem2 Pointer to array element to be compared with key Remarks The _lsearch function performs a linear search for the value key in an array of num elements, each of width bytes in size. Unlike bsearch, _lsearch does not require the array to be sorted. If key is not found, _lsearch adds it to the end of the array and increments num. The compare argument is a pointer to a user-supplied routine that compares two array elements and returns a value specifying their relationship. _lsearch calls the compare routine one or more times during the search, passing pointers to two array elements on each call. compare must compare the elements, then return either nonzero, meaning the elements are different, or 0, meaning the elements are identical.

43

Example:
/* LFIND.C: This program uses _lfind to search for * the word "hello" in the command-line arguments. */ #include <search.h> #include <string.h> #include <stdio.h> int compare( const void *arg1, const void *arg2 ); void main( unsigned int argc, char **argv ) { char **result; char *key = "hello"; result = (char **)_lfind( &key, argv, &argc, sizeof(char *), compare ); if( result ) printf( "%s found\n", *result ); else printf( "hello not found!\n" ); } int compare(const void *arg1, const void *arg2 ) { return( _stricmp( * (char**)arg1, * (char**)arg2 ) ); }

Output
[C:\code]lfind Hello Hello found

44

qsort - performs a quick sort. void qsort( void *base, size_t num, size_t width, int (__cdecl *compare )(const void *elem1, const void *elem2 ) ); Routine qsort Required Header <stdlib.h> and <search.h> Compatibility ANSI, Win 95, Win NT

For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB LIBCMT.LIB MSVCRT.LIB Return Value None Parameters: base Start of target array num Array size in elements width Element size in bytes compare Comparison function elem1 Pointer to the key for the search elem2 Pointer to the array element to be compared with the key Remarks: The qsort function implements a quick-sort algorithm to sort an array of num elements, each of width bytes. The argument base is a pointer to the base of the array to be sorted. qsort overwrites this array with the sorted elements. The argument compare is a pointer to a user-supplied routine that compares two array elements and returns a value specifying their relationship. qsort calls the compare routine one or more times during the sort, passing pointers to two array elements on each call: compare( (void *) elem1, (void *) elem2 ); The routine must compare the elements, then return one of the following values: Return Value <0 0 >0 Description elem1 less than elem2 elem1 equivalent to elem2 elem1 greater than elem2 Single thread static library, retail version Multithread static library, retail version Import library for MSVCRT.DLL, retail version

45

The array is sorted in increasing order, as defined by the comparison function. To sort an array in decreasing order, reverse the sense of greater than and less than in the comparison function. Example:
/* QSORT.C: This program reads the command-line * parameters and uses qsort to sort them. It * then displays the sorted arguments. */ #include <stdlib.h> #include <string.h> #include <stdio.h> int compare( const void *arg1, const void *arg2 ); void main( int argc, char **argv ) { int i; /* Eliminate argv[0] from sort: */ argv++; argc--; /* Sort remaining args using Quicksort algorithm: */ qsort( (void *)argv, (size_t)argc, sizeof( char * ), compare ); /* Output sorted list: */ for( i = 0; i < argc; ++i ) printf( "%s ", argv[i] ); printf( "\n" ); } int compare( const void *arg1, const void *arg2 ) { /* Compare all of both strings: */ return _stricmp( * ( char** ) arg1, * ( char** ) arg2 ); }

Output:
[C:\code]qsort every good boy deserves favor boy deserves every favor good

46

Samples on search algorithms


Linear search

(from http://www.paked.net)
The linear search, or sequential search, is simply examining each element in a list one by one until the desired element is found. The linear search is not very efficient. If the item of data to be found is at the end of the list, then all previous items must be read and checked before the item that matches the search criteria is found.
#include <iostream.h> int LinearSearch(int [], int, int); int main() { const int MAX_ITEMS = 10; int nums[MAX_ITEMS] = {5,10,22,32,45,67,73,98,99,101}; int item, location; cout << "Enter the item you are searching for: "; cin >> item; location = LinearSearch(nums, MAX_ITEMS, item); if (location > -1) cout << "The item was found at index location " << location << endl; else cout << "The item was not found in the list\n"; return 0; } // this function returns the location of key in the list // -1 is returned if the value is not found int LinearSearch(int list[], int size, int key) { int i; for (i = 0; i < size; i++) { if (list[i] == key) return i; } return -1; }

47

Binary search

(from http://www.paked.net)

First, the search item is compared with the middle element of the list. If the search item is less than the middle item of the list, we restrict the search to the upper half of the list; otherwise, we search the lower half of the list.

#include <iostream.h> #include <stdio.h> int BinarySearch(int [], int, int); static int k=0; int main() { const int MAX_ITEMS = 20; int nums[MAX_ITEMS] = {5,10,22,32,45,67,73,98,99,101,110,118,134,145,156,175,180,190,195,200}; int item, location; cout << "Enter the item you are searching for: "; cin >> item; location = BinarySearch(nums, MAX_ITEMS, item); // output how many iterations it took to find a number: printf("Iterations to find an item: %d\n", k); if (location > -1) cout << "The item "<<nums[location]<<" was found at index location " << location <<endl; else cout << "The item was not found in the list\n"; return 0; } // this function returns the location of key in the list // -1 is returned if the value is not found int BinarySearch(int list[], int size, int key) { int left, right, midpt; left = 0; right = size - 1; while (left <= right) { midpt = (int) ((left + right) / 2); //calculate how many iterations it took to find a number: k++; if (key == list[midpt]) { return midpt; } else if (key > list[midpt]) left = midpt + 1; else right = midpt - 1; } return -1; }

48

Interpolation search
#include <iostream.h> #include <stdio.h> int InterpolationSearch(int [], int, int); static int k=0; static int i=0; static int j=0; int main() { const int MAX_ITEMS = 20; int nums[MAX_ITEMS] = {5,10,22,32,45,67,73,98,99,101,110,118,134,145,156,175,180,190,195,200}; int item, location; cout << "Enter the item you are searching for: "; cin >> item; location = InterpolationSearch(nums, MAX_ITEMS, item); // output how many iterations it took to find a number: printf("Iterations to find an item: %d\n", k); if (location > -1) cout << "The item "<<nums[location]<<" was found at index location " << location <<endl; else cout << "The item was not found in the list\n"; return 0; } // this function returns the location of key in the list // -1 is returned if the value is not found int InterpolationSearch(int list[], int size, int key) { int left, right, midp; left = 0; right = size - 1; while (list[left] < key && list[right] >= key) { midp = left + ((key - list[left]) * (right - left)) / (list[right] - list[left]); //calculate how many iterations it took to find a number: k++; if (list[midp] < key) left = midp + 1; //Repetition of the comparison code is forced by syntax linitations: else if (list[midp]> key) right = midp - 1; else return midp; } if (list[left]== key) return left; else return -1; // Not found }

49

Samples on sorting algorithms


Bubble sort

(from http://www.paked.net)
The bubble sort is one of the simplest sorting algorithms, but not one of the most efficient. It puts a list into increasing order by successively comparing adjacent elements, interchanging them if they are in the wrong order.

Source code:
#include <iostream.h> int BubbleSort(int [], int); int main() { const int MAX_ITEMS = 10; int nums[MAX_ITEMS] = {22,5,67,98,45,32,101,99,73,10}; int i, moves; moves = BubbleSort(nums, MAX_ITEMS); cout << "The sorted list, in ascending order, is:\n"; for (i = 0; i < MAX_ITEMS; ++i) cout << " " <<nums[i]; cout << '\n' << moves << " were made to sort this list\n"; return 0; } int BubbleSort(int num[], int max_items) { int i, j, grade, moves = 0; for ( i = 0; i < (max_items - 1); i++) { for(j = 1; j < max_items; j++) { if (num[j] < num[j-1]) { grade = num[j]; num[j] = num[j-1]; num[j-1] = grade; moves++; } } } return moves; }

50

Selection sort

(from http://www.paked.net)
#include <iostream.h> int SelectionSort(int [], int); int main() { const int MAX_ITEMS = 10; int nums[MAX_ITEMS] = {22,5,67,98,45,32,101,99,73,10}; int i, moves; moves = SelectionSort(nums, MAX_ITEMS); cout << "The sorted list, in ascending order, is:\n"; for (i = 0; i < MAX_ITEMS; i++) cout << " " << nums[i]; cout << '\n' << moves << " moves were made to sort this list\n"; return 0; } int SelectionSort(int num[], int max_items) { int i, j, min, minidx, grade, moves = 0; for ( i = 0; i < (max_items - 1); i++) { min = num[i]; // assume minimum is the first array element minidx = i; // index of minimum element for(j = i + 1; j < max_items; j++) { if (num[j] < min) // if we've located a lower value { // capture it min = num[j]; minidx = j; } } if (min < num[i]) // check if we have a new minimum { // and if we do, swap values grade = num[i]; num[i] = min; num[minidx] = grade; moves++; } } return moves; }

51

Heapsort

(from http://24bytes.com)
/ Heap sort.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include <iostream.h> #include <conio.h> int heapSize = 15; void print(int a[]) { for (int i = 0; i <= 14; i++) { cout << a[i] << "-"; } cout << endl; } int parent(int i) { if(i==1) return 0; if(i%2==0) return ( (i / 2)-1); else return ( (i / 2)); } int left(int i) { return (2 * i) + 1; } int right(int i) { return (2 * i) + 2; } void heapify(int a[], int i) { int l = left(i), great; int r = right(i); if ( (a[l] > a[i]) && (l < heapSize)) { great = l; } else { great = i; } if ( (a[r] > a[great]) && (r < heapSize)) { great = r; } if (great != i) { int temp = a[i]; a[i] = a[great]; a[great] = temp; heapify(a, great); } } void BuildMaxHeap(int a[]) { for (int i = (heapSize - 1) / 2; i >= 0; i--) { heapify(a, i); print(a); } } void HeapSort(int a[]) { BuildMaxHeap(a); for (int i = heapSize; i > 0; i--) {

52

int temp = a[0]; a[0] = a[heapSize - 1]; a[heapSize - 1] = temp; heapSize = heapSize - 1; heapify(a, 0); } } void main() { int arr[] = { 2, 9, 3, 6, 1, 4, 5, 7, 0, 8, 9, 8, 7, 7, 8, 9}; HeapSort(arr); print(arr); }

53

Counting sort
// Count Sort sample.cpp : Defines the entry point for the console application. // from http://24bytes.com/Count-Sort.html // Counting sort algorithm. #include "stdafx.h" #include <iostream.h> #include <stdio.h> #include <conio.h> void print(int a[]) { for (int i = 0; i < 10; i++) { cout << a[i] << "-"; } cout << endl; } int max(int a[]) { int maxValue = 0; for (int i = 0; i < 10; i++) { if (a[i] > maxValue) { maxValue = a[i]; } } return maxValue; } void countTimes(int a[], int b[]) { int maxItem = max(a); for (int i = 0; i <= maxItem; i++) { b[i] = 0; } for (i = 0; i < 10; i++) { b[a[i]] = b[a[i]] + 1; } } void countRanks(int a[],int b[]) { for (int i = 1; i <= 7; i++) { b[i] += b[i - 1]; } } void countSort(int a[], int b[], int c[]) { for (int i = 9; i >= 0; i--) { cout << a[i] << "-" << b[a[i]] << "\n"; c[b[a[i]] - 1] = a[i]; b[a[i]] = b[a[i]] - 1; } } void main() { int a[] = { 1, 2, 4, 6, 7, 7, 0, 6, 3, 5}; print(a); int * b;

54

int maxValue = max(a); cout << "The maximum value is: " << maxValue << endl; int c[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; b = new int[maxValue]; countTimes(a, b); countRanks(a,b); countSort(a, b, c); print(c); }

55

sort and partial sort (STL) The sample code below illustrates how to use the partial_sort STL function in Visual C++. Required Header:
<algorithm>

Prototype:
template<class RandomAccessIterator> inline void partial_sort(RandomAccessIterator first, RandomAccessIterator middle, RandomAccessIterator last)

Note: The class/parameter names in the prototype do not match the version in the header file. Some have been modified to improve readability. Description: The partial_sort algorithm sorts the smallest N elements, where N = middle - first of the sequence [first, last). The remaining elements end up in the range [middle..last) in an undefined order. The non-predicate version of partial_sort uses operator< for comparisons.
////////////////////////////////////////////////////////////////////// // // Compile options needed: /GX // // partial_sort_copy.cpp : Illustrates how to use the partial_sort_copy // function. // // Functions: // // partial_sort_copy : Sort the smallest N elements in a sequence // and copy the resulting sequence // to another sequence. ////////////////////////////////////////////////////////////////////// // disable warning C4786: symbol greater than 255 character, // okay to ignore #pragma warning(disable: 4786) #include #include #include #include <iostream> <algorithm> <functional> <vector>

using namespace std ; void main() { const int VECTOR_SIZE = 8 ; // Define a template class vector of int typedef vector<int> IntVector ; //Define an iterator for template class vector of strings typedef IntVector::iterator IntVectorIt ; IntVector Numbers(VECTOR_SIZE) ; IntVector Result(4) ; IntVectorIt start, end, it ;

56

// Initialize vector Numbers Numbers[0] = 4 ; Numbers[1] = 10; Numbers[2] = 70 ; Numbers[3] = 30 ; Numbers[4] = 10; Numbers[5] = 69 ; Numbers[6] = 96 ; Numbers[7] = 7; start = Numbers.begin() ; end = Numbers.end() ; // location of first // element of Numbers // one past the location // last element of Numbers

cout << "Before calling partial_sort_copy\n" << endl ; // print content of Numbers cout << "Numbers { " ; for(it = start; it != end; it++) cout << *it << " " ; cout << " }\n" << endl ; // sort the smallest 4 elements in the Numbers // and copy the results in Result partial_sort_copy(start, end, Result.begin(), Result.end()) ; //Uncomment next line to sort the same numbers: //sort(start, end); cout << "After calling partial_sort_copy\n" << endl ; cout << "Numbers { " ; for(it = start; it != end; it++) cout << *it << " " ; cout << " }\n" << endl ; cout << "Result { " ; for(it = Result.begin(); it != Result.end(); it++) cout << *it << " " ; cout << " }\n" << endl ; }

57

Sorting using binary search tree


// // // // // BinaryTreeC.cpp : Defines the entry point for the console application. Based on sample from MSDN article "References to pointers". Corrected (to make it work) by S.Chepurin, 08.03.2008 Purpose of the program: creates a BinaryTree structure used to output entered words in alpabetical order.

#include <stdio.h> #include <iostream.h> #include <string.h> // Define a binary tree structure. struct BTree { char *szText; BTree *Left; BTree *Right; }; // Define a pointer to the root of the tree. BTree *btRoot = 0; int Add1( BTree **Root, char *szToAdd ); int Add2( BTree*& Root, char *szToAdd ); void PrintTree( BTree* btRoot ); int main( int argc, char *argv[] ) { if( argc < 2 ) { cerr << "Usage: Refptr [1 | 2]" << "\n"; cerr << "\n\twhere:\n"; cerr << "\t1 uses double indirection\n"; cerr << "\t2 uses a reference to a pointer.\n"; cerr << "\n\tInput is from stdin.\n"; return 1; } char *szBuf = new char[128]; // Read a text file from the standard input device and // build a binary tree. while(!cin.eof()) { cout<<"Enter new word:" << endl; cin.getline( szBuf, 128, '\n' ); printf("\n"); switch( *argv[1] ) { // Method 1: Use pointer to the pointer. case '1': Add1( &btRoot, szBuf ); break; // Method 2: Use reference to a pointer. case '2': Add2( btRoot, szBuf ); break; default: cerr << "Illegal value '" << *argv[1] << "' supplied for add method.\n" << "Choose 1 or 2.\n"; return -1; } PrintTree( btRoot );

58

} return 0; } // PrintTree: Display the binary tree in order. void PrintTree( BTree* btRoot ) { //printf("Enter function Print\n"); // Traverse the left branch of the tree recursively. if( btRoot->Left ) PrintTree( btRoot->Left ); //printf("function Print\n"); // Print the current node. cout << btRoot->szText << "\n"; // Traverse the right branch of the tree recursively. if( btRoot->Right ) PrintTree( btRoot->Right ); } // Add1: Add a node to the binary tree. // Uses pointer to the pointer dereference ("double indirection"). int Add1( BTree **Root, char *szToAdd ) { if( (*Root) == 0 ) { (*Root) = new BTree; (*Root)->Left = 0; (*Root)->Right = 0; (*Root)->szText = new char[strlen( szToAdd ) + 1]; strcpy( (*Root)->szText, szToAdd ); return 1; } else if( strcmp( (*Root)->szText, szToAdd ) > 0 ) return Add1( &((*Root)->Left), szToAdd ); else return Add1( &((*Root)->Right), szToAdd ); printf("exit function Add1\n"); } // Add2: Add a node to the binary tree. // Uses reference to pointer int Add2( BTree*& Root, char *szToAdd ) { if( Root == 0 ) { Root = new BTree; Root->Left = 0; Root->Right = 0; Root->szText = new char[strlen( szToAdd ) + 1]; strcpy( Root->szText, szToAdd ); return 1; } else if( strcmp( Root->szText, szToAdd ) > 0 ) return Add2( Root->Left, szToAdd ); else return Add2( Root->Right, szToAdd ); }

59

Dictionary hashing algorithm (MSDN) (from MSDN, "Dictionary hashing algorithm used by the LIB utility" Article ID: Q71891, July 17, 1997) The last part of each library produced by the Microsoft Library Manager (LIB)1 contains a dictionary that holds all the public symbols in the library. The hashing algorithm mentioned on page 63 of the "Microsoft C Developer's Toolkit Reference" is used to place data in the dictionary. The code required to implement the hashing algorithm is shown at the end of this article. (See Appendix on page 3 for description of hashing table).

Algoritm description The library dictionary is divided into pages that are 512 bytes long. Each page starts with a 37-byte bucket table, which contains 37 separate offsets to the symbols in the rest of the page. The hashing algorithm analyzes a symbol's name and produces two indexes (page index and bucket index) and two deltas (page index delta and bucket index delta). Using the offset contained in the bucket (addressed by bucket index) in the page (addressed by page index), you compare the symbol at that location with the one you are looking for. If (due to symbol collision) you have not found the correct symbol, add the bucket index delta to the current bucket index, and try again. Continue until all symbols in the bucket are tried. For more information on the actual format of the symbols in the dictionary, and information on the format for the rest of the library, see the "Microsoft C Developer's Toolkit Reference." Sample code:
/* This code illustrates the hashing algorithm used by LIB */ /* Compile options needed: none */ #include #include #include #include <stdio.h> <string.h> <malloc.h> <stdlib.h>

#define XOR ^ #define MODULO % char *symbol; int dictLength; int buckets; /* Symbol to find (or to place) */ /* Dictionary length in pages */ /* Number of buckets on one page */ /* A pointer to the beginning of the symbol */ /* A pointer to the end of the symbol */ /* Length of the symbol's name */ /* /* /* /* Page Index */ Page Index Delta */ Bucket Index */ Bucket Index Delta */

char *ptrSymbolBegin; char *ptrSymbolEnd; int stringLength; int int int int page_index; page_index_delta; bucket_index; bucket_index_delta;

unsigned character;

Microsoft 32-bit Library Manager (LIB) - creates and manages a library of Common Object File Format (COFF) object files (.obj). LIB can also be used to create export files and import libraries to reference exported definitions. LIB creates standard libraries, import libraries, and export files that you can use with LINK when building a 32-bit program. LIB runs from a command prompt. See MSDN, Visual C++ Programmer's Guide, "LIB Reference". 60

void hash(void) { page_index = 0; page_index_delta = 0; bucket_index = 0; bucket_index_delta = 0; while( stringLength--) { //to convert character to lower case we add 32 to the symbol's ASCII code //using "bitwise OR": character = *(ptrSymbolBegin++)| 32;//from the beginnig of the string printf("\nSymbol: %c", character); page_index = (page_index<<2) XOR character; /* Hash */ bucket_index_delta = (bucket_index_delta>>2) XOR character; /* Hash */ //in one cycle hashing two characters (for that we use the second pointer //starting from the end of the string): character = *(ptrSymbolEnd--) | 32;//from the end of the string bucket_index = (bucket_index>>2) XOR character; /* Hash */ page_index_delta = (page_index_delta<<2) XOR character; /* Hash */ } /* Calculate page index */ page_index = page_index MODULO dictLength; /* Calculate page index delta */ if( (page_index_delta = page_index_delta MODULO dictLength) == 0) page_index_delta = 1; /* Calculate bucket offset */ bucket_index = bucket_index MODULO buckets; /* Calculate bucket offset delta */ if( (bucket_index_delta = bucket_index_delta MODULO buckets) == 0) bucket_index_delta = 1; } void main(void) { int i; dictLength = 3; buckets = 37; //if no memory to store the symbol: if ( (symbol = (char *) malloc( sizeof(char) * 4 )) == NULL ) exit(1); strcpy( symbol, "one"); for( i = 0; i < 2; i++ ) { stringLength = strlen(symbol); ptrSymbolBegin = symbol; ptrSymbolEnd = symbol + stringLength ; hash(); printf("\nSymbol %d: page_index: %2d page_index_delta: %d", i, page_index, page_index_delta); printf("\n\t bucket_index: %2d bucket_index_delta: %d\n", bucket_index, bucket_index_delta); strcpy( symbol, "two"); } }

61

Você também pode gostar