Você está na página 1de 39

Analysis and Design of Algorithms

Introduction:- algorithm definition and specification Design of Algorithms, and Complexity of Algorithms, Asymptotic Notations, Growth of function, Recurrences, Performance analysis Elementary Data structures:- stacks and queues trees dictionaries priority queues sets and disjoint set union graphs basic traversal and search techniques. Divide and conquer:- General method binary search merge sort Quick sort The Greedy method:-General method knapsack problem minimum cost spanning tree single source shortest path. Dynamic Programming general method multistage graphs all pair shortest path optimal binary search trees 0/1 Knapsack traveling salesman problem flow shop scheduling. Backtracking:- general method 8-Queens problem sum of subsets graph coloring Hamiltonian cycles knapsack problem Branch and bound:- The Method 0/1 Knapsack problem traveling salesperson. Parallel models:-Basic concepts, performance Measures, Parallel Algorithms: Parallel complexity, Analysis of Parallel Addition, Parallel Multiplication and division, parallel Evaluation of General Arithmetic Expressions, First-Order Linear recurrence.
Preface 1. Introduction to Algorithm : Definition & Specification, Notation used for writing algorithms, steps used to write algorithms, characteristics of algorithms. 2. Analysis of Algorithm : Complexity of algorithms and efficiency measurements of algorithms, Growth of Function, Recerrences, Performance 3. Tools for develop of algorithm : Basic tools like Top down approach, Structured approach : selection statement, iterations, recursion, Networks 4. Data Structures : Arrays, Record, Stacks, Queue, Trees, Graphs, Dictionary, Set and Disjoint set Union 5. Design of Algorithm : Determining a model for solution of the given problem, Basic methods for design algorithms : sub goal, Hill climbing, working backward, Heuristics, Approximation. 6. Divide and Conquer : Binary Search, Find Maximum and Minimum. Merge Sort, Quick Sort, Convex Hull, Matrix multiplication 7. Greedy Methods : Knapsack Problem, Job Sequencing, Minimum Cost Spanning Tree (Kruskals, Prims algorithms), Scheduling 8. Backtracking : 8-queen problem, sum of subsets, graph coloring, Hamiltonian cycle, knapsack problem, cycle lock problem 9. Branch and Bound : Least Cost Search, 15-puzzle problem, 0/1 knap sack problem, Travelling sales man problem 10. Dynamic Programming : Principal of Optimality, Optimal Binary Search, 0/1 Knap sack problem, Multistage graph, All pair shortest paths, Optimal BST, Travelling Sales man problem, Chain matrix multiplication, Flow shop scheduling 11.Parallel models:-Basic concepts, performance Measures, Parallel Algorithms: Parallel complexity, Analysis of Parallel Addition, Parallel Multiplication and division, parallel Evaluation of General Arithmetic Expressions, First-Order Linear recurrence. Books :

Algorithms and Data structure by : Coreman Algorithms and Data structure by : Aho ullman Instroduction to Design and Analysis by : Goodman Fundamentals of Algorithms by : Gilles Brassard, Paul Bratley Fundamentals of Algorithms by : Horwitz & Sahani Data Structure, Algorithms and Application by Sartaj Sahni

Algorithms : At first look, it may seem that some one has jumbled the first four letters of the word Logarithm. The word algorithm was not in dictionary till 1957. There was a word logarism. It means a process of doing arithmetic with Arabic numerals. Further it was considered to be a combination of two words algiros which means painful and arithmos which means numbers. Finally mathematicians found that it has basically originated from the name of a famous Parsian text book author. Gradually the form and meaning of word algorism became amended. The word was later refashioned by the phrase Learned Confusion. As per a mathematical dictionary the word is defined as follows. Under the word Algorithms, there are combination of the four types of calculations namely addition, multiplication, subtraction and division. The Leibnitz has defined this word as ways of calculation with infinitely small quantities. Later the proper definition of algorithms was established. An algorithm can be defined as finite set of computational steps that transform the input taken by it self into required output. It is a tool for solving a well specified computational problem. Algorithm produces one or more outputs and has zero or more inputs which are provided externally. In other words an algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output. An algorithm is thus a sequence of computational steps that transform the input into the output. An algorithm is also considered as a tool for solving a well-specified computational problem. The statement of the problem specifies in general terms the desired input/output relationship. The algorithm describes a specific computational procedure for achieving that input/output relationship Notation used for writing an algorithm : As an algorithm is becoming a very efficient and easy pre programming tool, therefore it is mostly used by leading persons in a software development project. The algorithms written by these persons are assigned to supporting programmers for writing corresponding program. Therefore it is essential that some fixed notation should be used while writing the algorithms. Initially Euclids algorithm was developed. This algorithm finds the greatest common divisor among two given positive integers. This algorithm is written as under : E (Euclids Algorith) : To determine greatest common divisor when two positive integers are given. E1 [read data] : Read two data a, b E2 [exchange if required ] : if (a < b) then ta ab bt end if E3 [find remainder ] : r a mod b E4 [check if zero ] : if (r <> 0) then Ab Br go to step E3 end if E5 [display gcd] : print b E6 Stop Some of the popular notation used for writing the algorithm are as under :

Though this is not the format provided by Euclids algorithm, but to make any algorithm more convergent, one should go through some standards. Therefore this algorithm has been written using some standards. Each algorithm should be assigned a name, called algorithm identification letter or a set of letters e.g. In Euclids algorithm the algorithm identifier is letter E. All the steps of the algorithm should be identified by algorithm identifiers letter(s) following by a number which connotes statement number. E.g. in Euclids algorithms these are E1, E2 etc. Each step of an algorithm should begin with a phrase in brackets [] which givers briefly the conceptual contents of that step. E.g in Euclids algorithm these are [read data], [exchange if required] etc. This phrase should be followed by a description in words and symbols of the action to be performed or decision to be made. If some comments are to be added to any of the step, then either parenthesis comments can be used or some specified symbols should be used to place the comments. The arrow () should be used in any assignment or replacement operation. It means assignment operation ab means that the value of variable b will be stored in variable a. In other words the value of variable a will be replaced by variable b. Proper order should be followed. E.g. if ab, br is the proper order then writing br, ab will possess different meaning. The algorithms starts at the lowest numbered step. E.g. number 1 ad steps are executed in sequential order, unless otherwise specified by some other control statement like goto etc. Each algorithm must contain a statement which means that algorithm steps. The notation used for denoting subscript or index number should be in brackets []. E.g. Vj should be written as V[j] and Vij should be written as V[i,j]. Characteristics of an Algorithms : While development of an algorithms one should consider the presence of following points. Finiteness : An algorithm must always terminate after a finite number of steps. In some cases the repetition of steps may be large in numbers. If a procedure is unable to be resolved in finite number of execution of steps then it is referred to be computational method. Definiteness : Each step of an algorithm must be precisely defined. The action to be carried out must be rigorously and unambiguously specified for each case. Many times due to lack of understandability one may think that the step might be lacking definiteness as the steps is narrated in English language. Therefore in such cases the mathematically expression are written in such a way so that it resembles the instruction of any computer language. Input : An algorithm may have zero or more inputs. The inputs are taken from a specified set of objects. Outputs : An algorithm may have one or more outputs. Output is basically a quantity which has a specified relation with the inputs. Efficiency : An algorithm is generally expected to effective. Means the steps should be sufficiently basic so that it may be possible for a man to resolve them.

Steps used in development of a complete Algorithms : When ever any problem is being solved then the user should go through a series of steps. These steps supports in power determination of an appropriate solution. These steps are as follows : 1. Statement of the problem : At the first, problem must be clearly defined. The definition of the problem must be stated in precise manner which helps the solution determiner to understand it properly. The problem statement should be written in such a way so that the

vocabulary used in the statement must be understandable by the solution developer. What is to be found out must be known. He should be in a position to recognize the solution. If there is any some information missing then it can be either be identified. If any assumptions are made then it must be explored. E.g. A sales representative is taking care about customer for a company and covers 20 cities. He works for large commission, but his company will reimburse him only 50% of the total travelling cost. He has figured it out that how much it cost him to travel by car between every pair of cities. He wants to reduce his travelling cost. In this statement, what is known (list of cities, cost pair cij, a matrix consisting cost), What is to find (Reduction in travelling cost). Now how to reorganize solution is bit difficult. Hence we can demand for more information viz. basic city of customer, number of customers in a city etc. 2. Development of a Model : After stating the problem, the algorithm is formulated as mathematical model. For each problem one will have to pay separate attention. The successful model can be derived by acquiring some experience. One may have to see various available models through which some experience may be gained. The mathematical structure which is best suited is determined. We can go through some identical problems which are solved using a model which resembles this one. The solution of appropriate model is identified by some factors as follows : Convenience of representation. Computational Simplicity. Usefulness of the various operations associated with the structure. After a development of mathematical model the problem should be restated in terms of these mathematical objects. In the sales man problem, let us consider with 5 cities. The problem can be represented using a graph or network. The graph can be represented using a 5 x 5 matrix. A 0 2 3 6 4 B 2 0 6 0 9 C 3 5 0 4 7 D 6 0 4 0 8 E 4 9 7 8 0
B 4 9 E 8 7 D 4 A 2 6 5 3

A B C D E

So for a tour A-D-C-E-B-A- will contribute the cost 6+4+7+9+2 = 28 units. 3. Design of an algorithm : After the development of the mathematical model, algorithm is designed. The design techniques basically depends upon the mathematical model. There can be more than one algorithm for the same problem, but these may differ in effectiveness. e.g. In case of sales man problem, we can determine all possible permutations on first N1 positive integers, where cities are numbered from 1 to N. N is the base city. We consider every possible tour and choose a tour with least cost MIN. Let the algorithm is designated as ETS. The input to the algorithm is number of cities and the cost of travelling of each pair of

cities in a cost matrix. The algorithm can be written as follows : ETS (Exhaustive Travelling Salesman) : There are N cities. It is required to find cost effective tour for a salesman. The cost from one city to another is given. ETS-1 : [Generate all permutations] For I 1 to (N-1)! Repeat from step 2 to 4 ETS-2 : [Get new permutation] P ith permutation of the integer 1 to N-1. ETS-3: [Construct new tour] Construct the tour T(p) that corresponds to the permutation P, and compute the cost COST( T(p)) ETS-4 : [Compare] If COST(T(p)) < MIN then Set TOUR T(p); and MIN COST(T(p)) The step ETS-2, step ETS-3 requires some sub algorithms. 4. Correctness of the Algorithm : This is the most tedious way to implement any algorithm. For this a set of test data is prepared for a variety of cases. Now this test data is tested with the algorithm. It does not imply some thing about efficiency. E.g. In case of ETS it can be seen by simple case study. 5. Implementation : Now the algorithm is coded into computer program. For some steps, the algorithm step can be directly converted into computer program, but in other cases it may become difficult to translate. These steps may require some subroutines or proper data structures. The consideration may affect computer memory as well as speed of execution of the program.

Analysis of Algorithms
The performance of a program can be measured in terms of amount of computer memory and time needed to run a program. There are two approaches namely analytical and experimental. In performance analysis the analytical approach is used while in performance measurement the experimental approach is used.

Space Complexity : The space complexity of a program is the amount it needs to run the
complete execution of the program. The space needed by the program has the components like Instructon Space, Data Space and Environmental stack Space. a. Instruction Space : It is used to store the compiled version of the program, It depends upon the compiler used, compiler option in effect at the time of compilation and the target computer. b. Data Space : It is the space needed to store all constant and variable values. It is dependent of the size of data types as prescribed by the respective compiler of the language. The following table shows sizes occupied by C++ compiler : Data Type Size Range Char 1 -128 to +127 Unsigned char 1 0 to 255 Short int / int 2 -32768 to 32767 Unsigned int 2 0 to 65535 Long 4 -231 to 231-1 Float 4 3.4 e 38 Double 8 1.7 e 308 Long double 10 3.4 e -4932 to 1.1 e+4932 Pointer 2 Near Pointer 4 Far c. Environmental Stack Space : The environmental stack is used to save information needed to resume execution of partially completed functions. Each time a function is invoked the following data are saved on the environmental stack : The return address The values of all local variables and value of formal parameters of invoked function The binding of all references and const reference parameter.

Time Complexity : The time complexity of a program depends on all the factors that the space
complexity depends on. The time taken by a program is come of compilation time and tike taken by the program in execution. The compile time does not depends on instance characteristics. As the compilation time is invested once and execution is repeatedly carried out many times, therefore the compilation time can be ignores while measuring performance analysis. Many of the factors on whioch the time T(P) depends are not known therefore we are concerned with the operation like addition, subtraction, multiplication, division, comparision, stores, loads and so on. Let n denotes the instance characteristics, we might have an expression for tp(n) of the form : Tp(n) : ca ADD(n) + cs SUBTRACT(n) + cm MULTIPLY(n)+ . Where ca, cs, cm and so on are constant representing the time needed for addition, subtraction and multiplication and so on respectively. The time can be evaluated by total count of operation and total step count.

Asymptotic Notation (O,,,,o)


Big Oh : It is also called upper bound of a function. Consider a function f(n) which is non-negative for all integers n 0. We say that f(n) is big oh of g(n), ''which we write f(n)=O(g(n)), if there exists an integer n 0 and a constant c > 0 such that for all integers n n 0, f(n) cg(n). The figure shows the relation between time and f(n). Mathematically we can say that O(g(n)) = {f(n) : positive constants c and n0, such that n
n0, we have 0 f(n) cg(n) }

Consider the function f(n)=8n+128 as shown in the following figure. Clearly, f(n) is non-negative for all integers n 0. We wish to show that f(n)=O(n2). As per definition of big Oh, we need to find an integer n0 and a constant c>0 such that for all integers n n0, f(n) c n2. It does not matter what the particular constants are. For example, suppose we choose c=1. Then

Since (n+8)>0 for all values of n 0, we conclude that (n 0 16) 0. That is, n 0 = 16. So, we have that 2 for c=1 and n 0 = 16, f(n) cn for all integers n n 0. Hence, f(n) = O(n 2). The figure clearly shows that the function f(n) = n 2 is greater than the function f(n)=8n+128 to the right of n=16.

Definition (Omega) : It is also called lower bound. Consider a function f(n) which is non-negative for all integers n 0. We say that f(n) is omega g(n), ''which we write f(n)= (g(n)), if there exists an integer n 0 and a constant c > 0 such that for all integers n n 0, f(n) cg(n). The figure shows the relation between time and f(n). Mathematically we can say that (g(n)) = {f(n) : positive constants c and n0, such that
n n0, we have 0 cg(n) f(n)} Consider the function f(x) = 5n2 64n + 256 which is shown in the following figure. Clearly, f(n) is non-negative for all integers n 0. We wish to show that f(n)=( n2). As per the definition of omega notation, we need to find an integer n0 and a constant c > 0 such that for all integers n n0, f(n) c n2.

It does not matter what the particular constants are. For example, suppose we choose c=1. Then

Since (n-8)2 0 for all values of n

0, we conclude that n 0=0. So, we have that for c=1 and n n 0, f(n) c n 2 for all integers n n0. Hence, f(n)=( n2). Figure clearly shows that the function f(n) = n 2 is 2less than the function f(n)=5n 64n+256 for all values of n 0. Of course, there are many other values of c and n 0 that will do. For example, c=2 and n0=16. Theta : It is also called range bound (i.e. upper as well lower bound both) of a function. A notation, ( ), to describe a function which is both O(g(n)) and (g(n)), for same g(n). Consider a function f(n) which is non-negative for all integers n 0. We say that f(n) is theta of g(n), ''which we write f(n) = (g(n), if and only if f(n) is O(g(n)) and f(n) is (g(n)). Here it must be ensured that for a given function the newly evaluated function g(n) must be same, there should difference only in the corresponding constant. (g(n)) = {f(n) : positive constants c1, c2, and n0, such

the

be

that n n0, we have 0 c1g(n) f(n) c2g(n) }


Little oh : It is tightly upper bound of a function. Consider a function f(n) which is non-negative for all integers n 0. We say that f(n) is little oh g(n),'' which we write f(n)=o(g(n)), if and only if f(n) is O(g(n)) but f(n) is not (g(n)). Little oh notation represents a kind of loose asymptotic bound in the sense that if we are given that f(n)=o(g(n)), then we know that g(n) is an asymptotic upper bound since f(n)=O(g(n)), but g(n) is not an asymptotic lower bound since f(n)=O(g(n)) and f(n) (g(n)) implies that f(n) (g(n).

For example, consider the function f(n)=n+1. Clearly, f(n) = O(n2). Clearly too, f(n) (n2), since not matter what c we choose, for large enough n, cn2 n+1. Thus, we may write f(n) = n + 1 = o(n2).

Heuristic & Approximation Algorithms


Some problems are difficult to be solved. Then we generally try to get its solution by deciding some instructive approach for them. Some times we have to find some sort of solution to a problem whether it is hard or not. For such cases the heuristic and approximation algorithms are used. Heuristic algorithm : It is a procedure assumed by our intuition that may produce a good or even optimal solution to our problem if we are lucky and may not get proper solution if we are not lucky. A heuristic may be deterministic or probabilistic. Though the probabilistic heuristic algorithm may not guarantees about its solution. Graph colouring Heuristic Algorithm : let G={N,A} be an undirected graph. We want to paint the nodes of G in such a way that no two adjacent nodes are of the same colour. The problem is to determine the minimum number of colours required. The following is an undirected graph with five nodes as follows : The greedy heuristic consists of choosing a colour and an arbitrary starting node, and then considering each node in turn. If a node can be painted with the first colour then we do so. When no further nodes can be painted, we choose a new colour and a new starting node that has not yet been painted. Then we paint as many nodes as we can with this second colour. This time we paint a node if none of its neighbours have been painted with the second colour. If there are still unpainted nodes then we choose the third colour. This process is continued till all the nodes are painted. Let we are give with a graph G={1,2,3,4,5} i.e. fie nodes. We wish to paint each node with a unique different colour from its adjacent node. There can be as many number of ways as we can think off. Let un first way, it requires two colours (e.g. red and blue). The red colour for node 1,3,4 and blue for node 2,5 as shown in the figure. This is an optimal solution to the given graph as all the nodes cannot be painted with one colour. Thus painting with two colours is an optimal solution. Suppose we paint the node 1,5 with red, then node 2 can be painted with blue. Now we need third colour say green to paint node 3,4. Hence minimum three colours are required. This is not the optimal solution as the solution with two colours is already available.
3 1 2 4 3 1 2 4 3 1 2 5 5 5

Travelling Salesman Problem using Heuristic 4 Algorithm : A travelling salesman visits a set of cities. Cost from one city to another city is given for each couple of cities. A tour is a simple directed cycle covering all the cities with the constraint that each city must be visited once only. The cost of tour is sum of all the cost in the path. Now the problem is to find a tour with optimal cost i.e. the lowest cost. The problem can be put in another way as : Let there is a undirected graph G (V,E), where V is set of vertices and E is set of edges. A tour of G is a directed cycle that includes every vertex in V just once and at the end coming back to the starting vertex (Hamiltonian Cycle). The cost of tour is sum of cost of edges on the tour. The travelling salesman problem is to find a tour of minimum cost.

For example, suppose the TSP following distances : to 1 2 3 F 1 0 3 10 R2 3 0 8 O 3 10 8 0 M4 11 12 9 5 25 26 20

contains six town with


1

2 4

4 11 12 9 0 15

5 7 9 4 5 0

6 25 26 20 15 18

5 3 6

The greedy heuristic consists of starting at an arbitrary node, and then choosing at each step to visit the nearest remaining unvisited node. In the example, if we start at node 1, then the nearest unvisited node is node 2. From node 2 the nearest unvisited node is node 3 and so on. After visiting the last node we come back to the starting point. Thus the tour constructed is 1,2,3,5,4,6 and 1 and the tour cost is 3+8+4+5+15+25 = 60 where as the optimal solution for this instance is total cost 58 constructed from tour 1,2,3,6,4,5 and 1 (the cost 3+8+20+15+5+7 = 58. This shows that the heuristic algorithm does not always determines the optimal solution.

Backtracking
Backtracking is the systematic way to search for the solution to a problem. In backtracking we begin by defining a solution space for the problem. This space includes at least one (optimal) solution to the problem. Sum of subsets : It is a kinf os NP-Completeness problem. Given positive numbers wi, 1<=i<=n and m. This problem calls for finding all subsets of the wi whose sums are m. e.g. if n=4(w1,w2,w3,w4)=(11,13,24,7) and m=31, then the desired subsets are (11,13,7) and (24,7). These solutions are generally represented using the indices of the values. Hence the same can be depicted as (1,2,4) and (3,4). In another formulation of the sum of subsets problem, each solution sub set is represented by an ntuple (x1,x2,x3,.xn) such that xi belongs to {0,1}, 1<=i<=n. Then xi = 0 if wi is not chosen and xi=1 if wi is chosen. This formulation expresses all solutions using a fixed sized tuple. For verification one can ensure that there are 2n distinct tuples. Let us take an another example. Let n=6, m=30 and (w1,w2,w3,w4,w5,w6)=(5,10,12,13,15,18) The following program implements the sum of subsets problem.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. int main() { } } { x[k]=0; subset(cs,k+1,r-w[k]); } else if(cs+w[k]+w[k+1] <=d) subset(cs+w[k],k+1,r-w[k]); if((cs+r-w[k]>=d)&&(cs+w[k+1])<=d) } #include<stdio.h> int count,w[10],d,x[10]; void subset(int cs,int k,int r) { int i; x[k]=1; if(cs+w[k]==d) { printf("\n Subset solution = %d\n",++count); for(i=0; i<=k; i++) { if(x[i]==1) printf("%d\n",w[i]);

27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. }

int sum=0,i,n; printf("enter no of elements\n"); scanf("%d",&n); printf("Enter the elements in ascending order\n"); for(i=0; i<n; i++) scanf("%d",&w[i]); printf("Enter the required sum\n"); scanf("%d",&d); for(i=0; i<n; i++) sum +=w[i]; if(sum < d) { printf("no solution exits\n"); } printf("The solution is\n"); count =0; subset(0,0,sum); return 0;

Divide & Conquer


Divide and conquer (D&C) is an important algorithm design paradigm based on multi-branched recursion. A
divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same (or related) type. This decomposition is further continued till the problem become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers, syntactic analysis (e.g., top-down parsers), and computing the discrete Fourier transform (FFTs). Many recursive algorithms take a problem with a given input and divide it into one or more smaller problems. This reduction is repeatedly applied until the solutions of smaller problems can be found quickly. This procedure is called divide-and-conquer algorithm. These algorithms divide a problem into one or more instances of the same problem of smaller size and they conquer the problem by using the solutions of the smaller problems to find a solution of the original problem, possibly with some additional work. Hence the divide and conquer approach is applied to the problems which are Decomposable and the concept to solve the problem is can be applied to each segment. a. Binary Search : A binary search is an algorithm for locating the position of an item in a sorted array. The idea is simple: compare the target to the middle item in the list. If the target is the same as the middle item, we've found the target. If it is before the middle item, then we repeat this procedure on the items before the middle. If it is after the middle item, repeat on the items after the middle. The method halves the number of items to check each time, so it finds it or determines it's not present in logarithmic time. Let f(n) be the numbers of comparisons needed for the search of an element in the list of size n ( suppose n is even). A list of size n is reduced into two lists, where each list has the size n/2. Then two comparisons are needed to implement this reduction: one to check which half of the list to use and the other one is to check if any terms of the list remain. So f(n)=f(n/2)+2 for even n. It is useful to find where an item is in a sorted array. For example, if there is an array for contact information, with people's names, addresses, and telephone numbers sorted by name, we can use binary search to find out a few useful facts: whether the person's information is in the array, what the person's address is, and what the person's telephone number is. Binary search will take far fewer comparisons than a linear search, but there are some downsides. Binary search can be slower than using a hash table. If items are changed, the array will have to be re-sorted so that binary search will work properly, which can take so much time that the savings from using binary search aren't worth it.

Algorithm :
BinarySearch(A[0..N-1], value, low, high) { if (high < low) return -1 // not found mid = low + ((high - low) / 2) if (A[mid] > value) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) else return mid // found }

b. Quick Sort : Quicksort is a sorting algorithm developed by Tony Hoare. On average, makes O(n
logn) comparisons to sort n items. In the worst case, it makes O(n2) comparisons, though this behavior is rare. Quicksort is often faster in practice than other O(n logn) algorithms. Quicksort is also known as "partition-exchange sort". Quick sort first divides a large list into two smaller sub-lists: the low elements and the high elements. Quick sort can then recursively sort the sub-lists. The steps are: 1. Pick an element, called a pivot, from the list. 2. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation. 3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements. The base case of the recursion are lists of size zero or one, which never need to be sorted. In very early versions of quick sort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot. Selecting a pivot element is also complicated by the existence of integer overflow. If the boundary indices of the sub array being sorted are sufficiently large, then the expression for the middle index, (left + right)/2, will cause overflow and provide an invalid pivot index. This can be overcome by using, for example, left + (right-left)/2 to index the middle element, at the cost of more complex arithmetic. He general quick sort algorithm can be written as follows :
function quicksort(array, 'left', 'right') { // If the list has 2 or more items if 'left' < 'right' // See "Choice of pivot" section below for possible choices choose any 'pivotIndex' such that 'left' 'pivotIndex' 'right' // Get lists of bigger and smaller items and final position of pivot 'pivotNewIndex' := partition(array, 'left', 'right', 'pivotIndex') // Recursively sort elements smaller than the pivot quicksort(array, 'left', 'pivotNewIndex' - 1) // Recursively sort elements at least as big as the pivot quicksort(array, 'pivotNewIndex' + 1, 'right') }

c. Merge Sort : Merge sort is an O(n log n) comparison-based sorting algorithm. Most implementations
produce a stable sort, meaning that the implementation preserves the input order of equal elements in the sorted output. It is a divide and conquer algorithm. Merge sort was invented by John von Neumann in 1945. Sorting is a common and important problem in computing. Given a sequence of N data elements, we are required to generate an ordered sequence that contains the same elements. Here, we present a parallel version of the well-known merge sort algorithm. The algorithm assumes that the sequence to be sorted is distributed and so generates a distributed sorted sequence. For simplicity, we assume that N is an integer multiple of P , that the N data are distributed evenly among P tasks, and that is an integer power of two.

Conceptually, a merge sort works as follows If the list is of length 0 or 1, then it is already sorted. Otherwise: Divide the unsorted list into two sublists of about half the size. Sort each sublist recursively by re-applying the merge sort. Merge the two sublists back into one sorted list. Merge sort incorporates two main ideas to improve its runtime: A small list will take fewer steps to sort than a large list. Fewer steps are required to construct a sorted list from two sorted lists than from two unsorted lists. Example: Use merge sort to sort a list of integers contained in an array: Suppose we have an array A with n indices ranging from A0 to An 1. We apply merge sort to A(A0..Ac 1) and A(Ac..An 1) where c is the integer part of n / 2. When the two halves are returned they will have been sorted. They can now be merged together to form a sorted array. In a simple pseudocode form, the algorithm could look something like this:
function merge_sort(m) if length(m) 1 return m var list left, right, result var integer middle = length(m) / 2 for each x in m up to middle add x to left for each x in m after or equal middle add x to right left = merge_sort(left) right = merge_sort(right) result = merge(left, right) return result function merge(left,right) var list result while length(left) > 0 or length(right) > 0 if length(left) > 0 and length(right) > 0 if first(left) first(right) append first(left) to result left = rest(left) else append first(right) to result right = rest(right) else if length(left) > 0 append first(left) to result left = rest(left) else if length(right) > 0 append first(right) to result right = rest(right) end while return result

d. Finding maximum and minimum : Let {a1, a2, , an} be a list. If n=1, then a1 is the max and min. Suppose n>1 and f(n) be the total numbers of comparisons needed to find the max and the min elements of the list with n elements. A list of size n is reduced into two lists, where each list has the half size or one sub-list has one element more than the other sub-list. Then two comparisons are needed to implement this reduction: one to compare the max of the two sub-lists and the other one is 1. Simple Linear Search approach : Initialize values of min and max as minimum and maximum of the first two elements respectively. Starting from 3rd, compare each element with max and min, and change max and min accordingly (i.e., if the element is smaller than min then change min, else if the element is greater than max then change max, else ignore the element)
/* structure is used to return two values from minMax() without using divide & conquer approach */ #include<stdio.h> struct pair { int min; int max; }; struct pair getMinMax(int arr[], int n) { struct pair minmax; int i; /*If there is only one element then return it as min and max both*/ if(n == 1) { minmax.max = arr[0]; minmax.min = arr[0]; return minmax; } /* If there are more than one elements, then initialize min and max*/ if(arr[0] > arr[1]) { minmax.max = arr[0]; minmax.min = arr[1]; } else { minmax.max = arr[0]; minmax.min = arr[1]; } for(i = 2; i<n; i++) { if(arr[i] > minmax.max) minmax.max = arr[i]; else if(arr[i] < minmax.min) minmax.min = arr[i]; }

return minmax; }

/* Driver program to test above function */ int main() { int arr[] = {1000, 11, 445, 1, 330, 3000}; int arr_size = 6; struct pair minmax = getMinMax (arr, arr_size); printf("\nMinimum element is %d", minmax.min); printf("\nMaximum element is %d", minmax.max); getchar(); }

The following algorithm finds the maximum and minimum data element among the given list of data elements.
MaxMin(i,j,max,min) { if (i=j) then max=min=a[i]; return; else { if(i=j-1) then { if(a[i]<a[j]) then max=a[j]; min=a[i]; else max=a[i]; min=a[j]; } else { mid=[(i+j)/2]; MaxMin(i,mid,max,min); MaxMin(mid+1,j,max1,min1); If (max<max1) then Max=max1; If(min>min1) then Min=min1; } } }

Here is a C program which finds the maximum and minimum using divide and conquer.
#include<stdio.h> int max, min; int a[100]; void maxmin(int i, int j) { int max1, min1, mid; if(i==j) {

max = min = a[i]; } else { if(i == j-1) { if(a[i] <a[j]) { max = a[j]; min = a[i]; } else { max = a[i]; min = a[j]; } } else { mid = (i+j)/2; maxmin(i, mid); max1 = max; min1 = min;

maxmin(mid+1, j); if(max <max1) max = max1; if(min > min1) min = min1; } } } void main () { int i, num; clrscr(); printf ("\n\t\t\tMAXIMUM & MINIMUM\n\n"); printf ("\nEnter the total number of numbers : "); scanf ("%d",&num); printf ("Enter the numbers : \n"); for (i=1;i<=num;i++)

{ scanf ("%d",&a[i]); }

max = a[0]; min = a[0]; maxmin(1, num); printf ("Maximum element in an array : %d\n", max); printf ("Minimum element in an array : %d\n", min); getch(); }

e. Convex Hull : A polygon is said to be convex if for any two points p1,p2 inside the polygon, the line joining these points will be fully inside the polygon. A convex hull of set S of points in the plane is defined to be the smallest convex polygon containing all the points of S. the following figure shows a convex hull. The vertices of convex hull of a set S of points form a subset of S. There are two ways to represent convex hull. 1. obtain the vertices of convex hull and 2. obtain the vertices of convex hull in same order. Let S be the set of points given for convex hull. We search for triplet of the points and ensure whether all the points lies within the triangle or not. The convex hull of a set Q of points is the smallest convex polygon P for which each point in is either on the boundary of Q P or in its interior. We denote the convex hull of by CH( Q Q). Each point in Q behaves as being a nail sticking out from a board. The convex hull is then the shape formed by a tight rubber band that surrounds all the nails. The following figure shows a set of points and its convex hull. There are two popular algorithms which determines the convex hull for given set of n points. These are Graham's scan Algorithm and Jarvis's march Algorithm. Grahams Scan Algorithm : It solves the convex-hull problem by maintaining a stack S of candidate points. Each point of the input set Q is pushed once onto the stack, and the points that are not vertices of CH(Q) are eventually popped from the stack. When the algorithm terminates, stack S contains exactly the vertices of CH(Q), in counter clockwise order of their appearance on the boundary. Set S of point in the plane is give. We are supposed to form a convex hull. This can be done using following steps. Form a simple polygon by joining all the dots. Now we remove points at concave angle. It is done one at a time using recursive approach. It is continued until we get final convex hull. The procedure GRAHAM-SCAN takes as input a set Q of points, where |Q| 3. It calls the functions TOP(S), which returns the point on top of stack S without changing S, and NEXTTO-TOP(S), which returns the point one entry below the top of stack S without changing S.

GRAHAM-SCAN(Q)
1 let p0 be the point in Q with the minimum y-coordinate or the point in case of a tie 2 let (p1, p2, ..., pm) be the remaining points in Q, sorted by in counter clockwise order around p0 (if more than one point angle, remove all but the one that is farthest from p0) 3 PUSH(p0, S) 4 PUSH(p1, S) 5 PUSH(p2, S) 6 for i 3 to m 7 do while the angle formed by points NEXT-TO-TOP(S), TOP(S), makes a nonleft turn 8 do POP(S) 9 PUSH(pi, S) 10 endfor 11 return S leftmost such polar angle has the same

and pi

a) The sequence ( p1, p2, ...,p12) of points numbered in order of increasing polar angle relative to p0, and the initial stack S containing p0, p1, and p2. (b)-(k) Stack S after each iteration of the for loop of lines 6-9. Dashed lines show nonleft turns, which cause points to be popped from the stack. In part (h), for example, the right

turn at angle (p7p8p9) causes p8 to be popped, and then the right turn at angle (p6p7p9) causes p7 to be popped. (l) The convex hull returned by the procedure. The remainder of the procedure uses the stack S. Lines 3-5 initialize the stack to contain, from bottom to top, the first three points p0, p1, and p2. Figure (a) shows the initial stack S. The for loop of lines 6-9 iterates once for each point in the subsequence (p3, p4, ..., pm). The intent is that after processing point pi , stack S contains, from bottom to top, the vertices of CH({ p0, p1, ..., pi}) in counterclockwise order. The while loop of lines 7-8 removes points from the stack if they are found not to be vertices of the convex hull. When we traverse the convex hull counterclockwise, we should make a left turn at each vertex. Thus, each time the while loop finds a vertex at which we make a nonleft turn, the vertex is popped from the stack. (By checking for a nonleft turn, rather than just a right turn, this test precludes the possibility of a straight angle at a vertex of the resulting convex hull. We want no straight angles, since no vertex of a convex polygon may be a convex combination of other vertices of the polygon.) After we pop all vertices that have nonleft turns when heading toward point pi , we push pi onto the stack. Figures (b)-(k) show the state of the stack S after each iteration of the for loop. Finally, GRAHAM-SCAN returns the stack S in line 10.

Disjoint Set :
An application of disjoint-set data structures : One of the many applications of disjoint-set data structures arises in determining the connected components of an undirected graph. For example, the following figure shows a graph with four connected components.

The following figure shows the collection of disjoint sets after each edge is processed.

The procedure CONNECTED-COMPONENTS uses the disjoint-set operations to compute the connected components of a graph. Once CONNECTED-COMPONENTS has been run as a preprocessing step, the procedure SAME-COMPONENT answers queries about whether two vertices are in the same connected component. (The set of vertices of a graph G is denoted by V [G], and the set of edges is denoted by E[G].)
CONNECTED-COMPONENTS(G) 1 for each vertex v _ V[G] 2 do MAKE-SET(v) 3 for each edge (u, v) _ E[G] 4 do if FIND-SET(u) FIND-SET(v) 5 then UNION(u, v) SAME-COMPONENT(u, v) 1 if FIND-SET(u) = FIND-SET(v) 2 then return TRUE 3 else return FALSE

The procedure CONNECTED-COMPONENTS initially places each vertex v in its own set. Then, for each edge (u, v), it unites the sets containing u and v. After all the edges are processed, two vertices are in the same connected component if and only if the corresponding objects are in the same set. Thus, CONNECTED-COMPONENTS computes sets in such a way that the procedure SAMECOMPONENT can determine whether two vertices are in the same connected component. Question : Suppose that CONNECTED-COMPONENTS is run on the undirected graph G = (V, E), where V = {a, b, c, d, e, f, g, h, i, j, k} and the edges of E are processed in the following order: (d, i), (f, k), (g, i), (b, g), (a, h), (i, j), (d, k), (b, j), (d, f), (g, j), (a, e), (i, d). List the vertices in each connected component after each iteration.

Dynamic Problems
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler steps. It is applicable to problems having the properties of overlapping subproblems which are only slightly smaller and optimal substructure. When applicable, the method takes far less time than nave methods. Top-down dynamic programming simply means storing the results of certain calculations, which are later used again since the completed calculation is a sub-problem of a larger calculation. Bottom-up dynamic programming involves formulating a complex calculation as a recursive series of simpler calculations. As shown for figure, the algorithm for finding the shortest path in a graph using optimal substructure; a straight line indicates a single edge; a wavy line indicates a shortest path between the two vertices it connects (other nodes on these paths are not shown); the bold line is the overall shortest path from start to goal. Dynamic programming is both a mathematical optimization method and a computer programming method. It refers to simplifying a complicated problem by breaking it down into simpler subproblems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively; Bellman called this the "Principle of Optimality". In terms of mathematical optimization, dynamic programming usually refers to a simplification of a decision by breaking it down into a sequence of decision steps over time. This is done by defining a sequence of value functions V1 , V2 , ... Vn , with an argument y representing the state of the system at times i from 1 to n. The definition of Vn(y) is the value obtained in state y at the last time n. The values Vi at earlier times i=n-1,n-2,...,2,1 can be found by working backwards, using a recursive relationship called the Bellman equation. For i=2,...n, Vi -1 at any state y is calculated from Vi by maximizing a simple function (usually the sum) of the gain from decision i-1 and the function Vi at the new state of the system if this decision is made. Since Vi has already been calculated for the needed states, the above operation yields Vi -1 for those states. Finally, V1 at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed. There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems which are only slightly smaller. When the overlapping problems are, say, half the size of the original problem the strategy is called "divide and conquer" rather than "dynamic programming". This is why mergesort, quicksort, and finding all matches of a regular expression are not classified as dynamic programming problems. Optimal substructure means that the solution to a given optimization problem can be obtained by the combination of optimal solutions to its subproblems. Consequently, the first step towards devising a dynamic programming solution is to check whether the problem exhibits such optimal substructure. Such optimal substructures are usually described by means of recursion. For example, given a graph G=(V,E), the shortest path p from a vertex u to a vertex v exhibits optimal substructure: take any intermediate vertex w on this shortest path p. If p is truly the shortest path, then the path p1 from u to w and p2 from w to v are indeed the shortest paths between the corresponding vertices (by the simple cut-and-paste argument described in CLRS). Hence, one can easily formulate the solution for finding shortest paths in a recursive manner, which is what the Bellman-Ford algorithm does. The figure shown subproblem graph for the Fibonacci sequence. The fact that it is not a treeindicates overlapping subproblems. This can be achieved in either of two ways:

Top-down approach: This is the direct fall-out of the recursive formulation of any problem. If the solution to any problem can be formulated recursively using the solution to its subproblems, and if its subproblems are overlapping, then one can easily memoize or store the solutions to the subproblems in a table. Bottom-up approach: This is the more interesting case. Once we formulate the solution to a problem recursively as in terms of its subproblems, we can try reformulating the problem in a bottom-up fashion: try solving the subproblems first and use their solutions to build-on and arrive at solutions to bigger subproblems. This is also usually done in a tabular form by iteratively generating solutions to bigger and bigger subproblems by using the solutions to small subproblems. Principal of Optimality : This principal states that in an optimal sequence of decisions or choices, each subsequence must also be optimal. Knapsack 0/1 Problem : In this approach, the item cannot be further broken into to smaller. Either the item will be selected (1) or rejected (0). Therefore it is also called knapsack 0/1 problem. Hence either the complete item will be selected or complete item will be rejected. To solve the problem by dynamic programming, we set up a table V[1..n,1..W], with one rwo for each available object and one column for each weight from 0 to W. In the table V[I,j] will be the maximum value of the objects numbered from 1 to I, 1<=i<=n. The solution instance can be found in V[n,W]. The entries of the table can be filled using following formula. V[i,j]=max ( V[i-1,j],V[i-1,j-wi]+vi) For out of bound entries we define V[0,j] to be 0 when j>=0 and infinite otherwise. Let there are 5 items. Their weight and corresponding value is as shown in the following table. Item no. Weight Value Value per unit weight 1 1 1 1.0 2 2 6 3.00 3 5 18 3.60 4 6 22 3.67 5 7 28 4.00 Let the capacity of knapsack is m=40. Therefore the solution of knapsack problem can be derived as follows : Weight 0 1 2 3 4 5 6 7 8 9 10 1 Limit 1 W1=1, 0 1 1 1 1 1 1 1 1 1 1 1 v1=1 W2=2, 0 1 6 7 7 7 7 7 7 7 7 7 v2=6 W3=5, 0 1 6 7 7 18 19 24 25 25 25 2 v3=18 5 W4=6, 0 1 6 7 7 18 22 24 28 29 29 4 v4=22 0 W5=7, 0 1 6 7 7 18 22 28 29 34 35 4 v5=28 0 The above table gives the maximum of optimal load as well as composition of the load. In our example we begin with V[5,11]. Since V[5,11]=V[4,11] but V[5,11]<> V[4, 11-w5]+v5, i.e. V[4, 11-7]+v5 or V[4,4]+v5 i.e. 7+28=35, hence an optimal load cannot include object 5. Next V[4,11]<>V[3,11] but V[4,11]=V[3, 11-w4]+v4 or V[3, 11-6]+v4 or V[3,5]+v4 or 18+22,

so an optimal load must include object4.

Now V[3,11]<>V[2,11] but V[3,11]=V[2, 11-w3]+v3 or V[2, 11-5]+v3 or V[2, 6]+v3 or 7+18 or 25, so an optimal load must include object 3. Similarly as V[2,11]=V[1,11] and V[1,11]=V[0,11] so optimal load does not include object 2 and 1. Hence the optimal load only includes object 3,4 which constitutes total weight of 40 units.

B. Shortest paths between each pair of nodes : Let G(N,A) be a directed graph, where N is set of nodes and A is the adjacency matrix for it. Now we want to calculate the shortest paths between each pair of nodes. The principal of optimality applies : if k is a node on the shortest path from I to j, then the part of the path from I to k and part of the path from k to j must also be optimal. The algorithm given for this purpose is Floyds algorithm, It is explored as follows :
1. Read weight matrix W. 2. D=W 3. for k = 1 to n for I = 1 to n for j=1 to n D[i,j]=min(D[i,j],D[i,k]+D[k,j]) next j next i next i 4. Return D. C. Chain Matrix Multiplication : A matrix multiplication can be shown as follows for matrix A[p x q]

and B [q x r] as C of order [p x r] : cij = aik bkj


k =1 q

1 i p,1 j r

where This can be written algorithmically as for i = 1 to p for j = 1 to r for k=1 to n C[i,j]= C[I,j]+A[I,k]xB[k,j] next k next j next i Suppose we want to calculate the multiplication of n matrices as : M = M1 x M2 x M3 x M4 x ..x Mn

The optimal substructure of this problem is as follows. Suppose that an optimal parenthesization of Ai Ai+1 Aj splits the product between Ak and Ak+1. Then the parenthesization of the "prefix" subchain Ai Ai+1 Ak within this optimal parenthesization of Ai Ai+1 Aj must be an optimal parenthesization of Ai Ai+1 Ak. Why? If there were a less costly way to parenthesize Ai Ai+1 Ak, substituting that parenthesization in the optimal parenthesization of Ai Ai+1 Aj would produce another parenthesization of Ai Ai+1 Aj whose cost was lower than the optimum: a contradiction. A similar observation holds for the parenthesization of the subchain Ak+1 Ak+2 Aj in the optimal parenthesization of Ai Ai+1 Aj: it must be an optimal parenthesization of Ak+1 Ak+2 Aj.

m[i, j ] = m[i, k] + m[k + 1, j ] + pi-1 pk pj.


This recursive equation assumes that we know the value of k, which we do not. There are only j - i possible values for k, however, namely k = i, i + 1, ..., j - 1. Since the optimal parenthesization must use one of these values for k, we need only check them all to find the best. Thus, our recursive definition for the minimum cost of parenthesizing the product Ai Ai+1 Aj becomes

The m[i, j] values give the costs of optimal solutions to subproblems. To help us keep track of how to construct an optimal solution, let us define s[i, j] to be a value of k at which we can split the product Ai Ai+1 Aj to obtain an optimal parenthesization. That is, s[i, j] equals a value k such that m[i, j] = m[i, k] + m[k + 1, j] + pi-1 pk pj.
MATRIX-CHAIN-ORDER(p) 1 n length[p] - 1 2 for i 1 to n 3 m[i, i] 0 4 Next i 4 for l 2 to n (l is the chain length.) 5 for i 1 to n - l + 1 6 ji+l-1 7 m[i, j] 8 for k i to j - 1 9 q m[i, k] + m[k + 1, j] + pi-1 pkpj 10 if q < m[i, j] then 11 m[i, j] q 12 s[i, j] k

13 Endif 14 Next k 15 Next i 16 Next l 17 return m and s

PRINT-OPTIMAL-PARENS(s, i, j) 1 if i = j 2 then print "A"i 3 else print "(" 4 PRINT-OPTIMAL-PARENS(s, i, s[i, j]) 5 PRINT-OPTIMAL-PARENS(s, s[i, j] + 1, j) 6 print ")"

PARALLEL MODEL Basic Concept : In computer science, a parallel algorithm is an algorithm which can be executed a piece
at a time on many different processing devices, and then put back together again at the end to get the correct result.

Some algorithms are easy to divide up into pieces like this. For example, splitting up the job of checking all of the numbers from one to a hundred thousand to see which are primes could be done by assigning a subset of the numbers to each available processor, and then putting the list of positive results back together. Most of the available algorithms to compute Pi, on the other hand, can not be easily split up into parallel portions. They require the results from a preceding step to effectively carry on with the next step. Such problems are called inherently serial problems. Iterative numerical methods, such as Newton's method or the three body problem, are also algorithms which are inherently serial. Some problems are very difficult to parallelize, although they are recursive. One such example is the depthfirst search of graph. Parallel algorithms are valuable because it is faster to perform large computing tasks than a serial (non-parallel) algorithm. It is far more difficult to construct a computer with a single fast processor than one with many slow processors with the same throughput. There are also certain theoretical limits to the potential speed of serial processors. Every parallel algorithm has a serial part and so parallel algorithms have a saturation point. After that point adding more processors does not yield any more throughputs but only increases the overhead and cost. The cost or complexity of serial algorithms is estimated in terms of the space (memory) and time (processor cycles) that they take. Parallel algorithms need to optimize one more resource, the communication between different processors. There are two ways parallel processors communicate, shared memory or message passing. Shared memory processing needs additional locking for the data, imposes the overhead of additional processor and bus cycles, and also serializes some portion of the algorithm. Message passing processing uses channels and message boxes but this communication adds transfer overhead on the bus, additional memory need for queues and message boxes and latency in the messages. Designs of parallel processors use special buses like crossbar so that the communication overhead will be small but it is the parallel algorithm that decides the volume of the traffic. Another problem with parallel algorithms is ensuring that they are suitably load balanced. For example, checking all numbers from one to a hundred thousand for primarily is easy to split amongst processors, however some processors will get more work to do than the others, which will sit idle until the loaded processors complete. Generally the algorithms executes on the machines which executes one computation at a time. Some times due to parallelism the more than one operations may be performed simultaneously like that fetching of next instructions is carrying out mean while the last instruction is still being executed or input / output operations being executed along with other computations. But in either of the case when one task is being executed, the other will have to wait. But sometimes we are in need of some operations which require some parallelism. E.g. there is a situation when more than one variables need to be assigned the values simultaneously. Let us take a situation when an array that is being changed by a number of parallel processors, with one processor for each array element. Now there is one instruction that says that stops the algorithm when the array element has been initialized to zero. Now question arises that how do one processor come to know whether all elements in the array becomes zero as the individual element is being processor by different processor. Either some common flag should be shared or the processor needs to pass

messages to one another. One popular model for such parallel computation is Parallel Random Access Machine or p-ram. This model is very common to use and easy to understand. In p-ram model a number of ordinary, sequential processors are assumed to share a global memory. Each processor has the usual set of arithmetic and logical instructions that it can execute in parallel with whatever is happening on the other processors. All the processors executes the same program but with data from one or more memory locations. Therefore sometimes such a model is also called single instruction and multiple data streams. Each processor has access to the whole of the global memory. At each step it may either read from or write to no more than one storage location. When one processor is reading a value than no other processors may write the data at same memory location.

Parallelism in Computers : Parallelism in a digital computer means performing more than one
task at the same time. E.g. IO chips : Most computers contain special circuits for IO devices which allow some task to be performed in parallel.

Pipelining of Instructions : Some CPU's pipeline the execution of instructions

Multiple Arithmetic units (AU) : Some CPUs contain multiple AU so it can perform more than one arithmetic operation at the same time But we are interested in parallelism involving more than multiple CPUs.

Common Terms for Parallelism


Concurrent Processing : A program is divided into multiple processes which are run on a single processor. The processes are time sliced on the single processor. Distributed Processing : A program is divided into multiple processes which are run on multiple distinct machines. The multiple machines are usual connected by a LAN. Machines used typically are workstations running multiple programs. Parallel Processing : A program is divided into multiple processes which are run on multiple processors The processors normally are in one machine which execute one program at a time and have high speed communications between them.

Parallel Programming
Issues in parallel programming not found in sequential programming Task decomposition, allocation and sequencing o Breaking down the problem into smaller tasks (processes) that can be run in parallel

o Allocating the parallel tasks to different processors o Sequencing the tasks in the proper order o Efficiently use the processors

Communication of interim results between processors o The goal is to reduce the cost of communication between processors. Task decomposition and allocation affect communication costs Synchronization of processes o Some processes must wait at predetermined points for results from other processes. Different machine architectures

Performance Issues Scalability o Using more nodes should allow a job to run faster o allow a larger job to run in the same time Load Balancing o All nodes should have the same amount of work o Avoid having nodes idle while others are computing Bottlenecks o Communication bottlenecks o Nodes spend too much time passing messages o Too many messages are traveling on the same path o Serial bottlenecks Communication o Message passing is slower than computation o Maximize computation per message o Avoid making nodes wait for messages

Parallel Machines : Parameters used to describe or classify parallel computers: Type and number of processors Processor interconnections Global control Synchronous vs. asynchronous operation Type and number of processors Massively parallel Computer systems with thousands of processors o Parallel Supercomputers o CM-5, Intel Paragon Coarse-grained parallelism o Few (~10) processor, usually high powered in system o Starting to be common in Unix workstations

Processor interconnections : Parallel computers may be loosely divided into two groups:
Shared Memory (or Multiprocessor) Message Passing (or Multicomputers)

A. Shared Memory or Multiprocessor : Individual processors have access to a common shared memory module(s) Examples are Alliant, Cray series, some Sun workstations. These machines have features such as Easy to build and program, Limited to a small number of processors and 20 - 30 for Bus based multiprocessor

Processor-Memory Interconnection Network Multiprocessor

B. Message Passing on Multicomputers : These are Individual processors with local memory. The Processors communicate via a communication network. Examples are Connection Machine series (CM-2, CM-5), Intel Paragon, nCube, Transputers, Cosmic Cube. These machines have Features : Can scale to thousands of processors

Mesh Communication Network

Intel Paragon is a mesh machine

Global Control
or SISD, SIMD, MIMD, MISD

SISD - Single Instruction Single Data


Sequential Computer

MISD - Multiple Instruction Single Data


Each processor can do different things to the same input. Example: Detect shapes in an image. Each processor searches for a different shape in the input image

SIMD - Single Instruction Multiple Data


Each processor does the same thing to different data. It requires global synchronization mechanism. Each processor knows its id number. All shared memory computers are not necessarily SIMD. Example: Adding two arrays A and B in Parallel do: K. Read A[K] and B[K] , Write A[K] + B[K] in C[K]

MIMD -Multiple Instruction Multiple Data


Each processor can run different programs on different data. MIMD can be shared memory or message passing. It can simulate SIMD or SISD if there is global synchronization mechanism. Communication is the main issue. It is harder to program than SIMD.

Parallel Algorithms-Models of Computation

PRAM
Parallel Random-Access Machine

It is the most commonly used model for expressing parallel algorithms. In this model the sequential processors are assumed to share a global memory. Each processor may have local memory to store local results. This model is MIMD, but algorithms tend to be SIMD. Each processor has the usual set of arithmetic and logical instructions that it can execute in parallel with whatever is happening on the other processor. Thus the active processor executes same instruction. Exclusive Read Exclusive Write (EREW) - no simultaneous access to a single sharedmemory location Concurrent Read Exclusive Write (CREW) - simultaneous reads of a single shared-memory location are allowed Concurrent Read Concurrent Write (CRCW) - simultaneous reads and writes are allowed on a single shared-memory location. Common CRCW PRAM allows concurrent writes only when all processors are writing the same value. Arbitrary CRCW PRAM allows an arbitrary processor to succeed at writing to the memory location. Priority CRCW PRAM allows the processor with minimum index to succeed. Examples read(X, Y) copy X from shared memory to local memory Y write(U, V) copy U from local memory to shared memory V Broadcasting Data in a CREW PRAM Model : In a CREW model we have the ability for more than one processor to read from a single memory location at one time. This allows for a very rapid transfer of the data value to the other processors. The following algorithm shows such parallelism : Start P1 writes the data value into M1 Parallel Start for k = 2 to p do Pk reads the data value from M1 end for Parallel End End This broadcast operation takes two cycles. The first writes the data into memory and the second has all of the processors read the value. This speed is only possible because of the concurrent read

capability.

Broadcasting Data in an EREW PRAM Model : In an exclusive read model, only one processor can read the data that was written by P1. If we were to just loop through the rest of the processors, we would have a sequential algorithm and would lose all of the power that we added with parallelism. If we use the read / process / write cycle of the second processor to write the data value into a second memory location, on the next pass two more processors can read the data value. If they then write the data value into new locations, four processors can read on the next pass. This gives us the following algorithm: P1 writes the data value into M1 procLoc = 1 for j = 1 to lg p do Parallel Start for k = procLoc + 1 to 2 * procLoc do Pk reads Mk-procLoc Pk writes to Mk end for k Parallel End procLoc = procLoc * 2 end for j This algorithm will first write the data value to location M1. On the first pass of the outer loop, P2 will read the data value and write it to M2, and procLoc becomes 2. The second pass has P3 and P4 read from locations M1 and M2 and then write to locations M3 and M4, and procLoc becomes 4. The third pass has P5 through P8 read from locations M1 through M4 and then write to locations M5 through M8. On the second to last pass, half of the processors will now have the data value and will have written it to M1 through Mp/2, which allows the second half of the processors to read in the data value. Because the read and write can be done in one instruction cycle, the parallel block does one instruction cycle, and the outer loop executes that block lg p times.Therefore, this parallel broadcast algorithm does O(lg p) operations. Finding the Maximum Value in a List : For this and our other operations on lists, we assume that the list has been loaded into memory locations M1 through MN. We assume that we have p = N / 2 processors. On the first pass, processor Pi will compare the values in locations M2i and M2i+1 and will write the larger of the two into location Mi. On the second pass, only half of the processors are needed to compare pairs of elements in memory locations M1 through MN/2 and then write the larger of each pair into locations M1 through MN/4. This gives the following algorithm:

count = N / 2 for i = 1 to (lg count) + 1 do Parallel Start for j = 1 to count do Pj reads M2j into X and M2j+1 into Y if X > Y Pj writes X into Mj else Pj writes Y into Mj end if end for j Parallel End count = count / 2 end for i Each of the passes of this algorithm cuts in half the number of values that have the potential for being the largest until eventually we are left with just one value. This is very much like the tournament method used with a single processor. If we have p < N / 2, we can perform a preprocessing step that reduces the number of values to 2 * p, and then this algorithm can continue as shown above. There are lg N passes of this algorithm, putting the time at O(lg N). Since the cost is the time multiplied by the number of processors, so the cost for this algorithm is N / 2 * O(lg N), or more simply O(N lg N). The simple sequential algorithm takes only O(N), so this parallel version is more costly, although it does run much faster. If parallel computing is really beneficial, there must be a faster alternative method that will cost no more than the sequential version. If we look closely, we see that the problem with the cost is the number of processors. We need to consider how we can reduce this number. If we want the total cost at the optimal level of O(N) and the run time of the parallel algorithm is O(lg N), it must be the case that we can only use N / lg N processors. This also means that the first pass must have each processor handle N / (N / lg N) values, which is lg N. This results in the following alternative parallel algorithm: Parallel Start for j = 1 to N/lg N do Pj finds the maximum of M1+(j-1)*lg N through Mj*lg N using the sequential algorithm Pj writes the maximum to Mj end for Parallel End count = (N / lg N) / 2 for i = 1 to (lg count) + 1 do Parallel Start for j = 1 to count do Pj reads M2j into X and M2j+1 into Y if X > Y Pj writes X into Mj else Pj writes Y into Mj end if

end for j Parallel End PARALLEL SEARCHING : In this version, we have a preprocessing step that has each processor do a sequential algorithm on a list of lg N elements, which you will recall will take O(lg N) operations. The next part of this algorithm is our original attempt with the number of processors now reduced to (N / lg N) / 2 (but the preprocessing step still requires N / lg N processors). So, the total cost of this algorithm is This last parallel version has the same cost as the sequential version, but it runs in a fraction of the time.

Você também pode gostar