Você está na página 1de 6

IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 7, Issue 3 (Nov. - Dec. 2012), PP 30-35 www.iosrjournals.

org

T-Sort: A tagging technique to optimize memory utilization in radix sort


Dhanyatha Manjunath1, Eshita Mathur2
1

(Computer Science Department, SDMCET/ VTU, India), 2(Information Science Department, SDMCET/ VTU, India)

Abstract: We live in an era of data explosion that necessitates the discovery of novel out-of-core techniques for sorting and searching of specific information from the plethora of data available. Sorting is an important problem that has ubiquitous applications. A novel linear sorting algorithm has been proposed to alleviate the problem of memory wastage. Several asymptotically optimal sorting algorithms are known and now the focus has shifted to developing algorithms for problem sizes of practical interest. Space and Time complexity are generally two criteria used to determine whether one algorithm is "better" than another.In this paper we present a novel algorithm for sorting,T-sort that takes significantly less amount of memory when compared to existing linear sorting techniques.The main aim of the proposed sorting technique,T-sort is to reduce space complexity with respect to other linear sorts keeping intact the stability and time complexity.This paper includes two programs implementing one of the best linear sorting technique,radix sort and the T-sort, the results of which show that the proposed technique has an edge over the original one in space complexity. Keywords: Counting Sort: A Sorting technique.[1],Dlarge: Number of digits in the largest of N,Radix Sort: A sorting technique,Space Complexity: The requirement of space for sorting operation,T-Sort: the proposed algorithm claiming optimization over Radix Sort,Time Complexity: The required time to sort.

I.

Introduction

Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the
complexity of solving it efficiently despite its simple, familiar statement. Computer scientists have long been interested in designing efficient sorting algorithms since they are an important part of managing data. There are several criteria to judge the efficiency of any algorithm, time and space complexity being the paramount parameters. The other factors include algorithmic complexity, startup costs, additional space requirements, use of recursion (function calls are expensive and eat stack space), worst-case behavior, assumptions about input data, caching, and behavior on already-sorted or nearly-sorted data As we know, no single sorting technique holds good for these measures. Simultaneously, but they can be compared to be better at specific measure and environment. The following issues have been addressed to significant extent with the newly proposed algorithm: 1. Memory is an important factor for sorting millions of records. Even though memory costs are getting cheaper day by day, reducing the usage of the same significantly helps us to cut the costs by a great margin. 2. Memory utilization is an important factor in mobile devices, the dependency on which is irresolute in today's era. 3.Within applications dealing with ever increasing amount of data,the idea of bringing in the right data at the right time becomes critical to reporting and analysis applications. Without accurate information, the data being analyzed and reported on becomes meaningless. Hence it is becoming increasingly important to keep data sorted, that will result the instant searching to be easier.

II.

ANALYSIS OF EXISTING LINEAR SORTING ALGORITHMS

In this section Radix and counting sort are described briefly. In addition to this the newly developed TSort algorithm is being explained. 2.1 Bucket Sort Bucket sort is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Bucket sort is not a comparison sort and computational complexity estimates involve the number of buckets. www.iosrjournals.org 30 | Page

T-Sort: A tagging technique to optimize memory utilization in radix sort


2.1.1 Analysis: Bucket Sort In the best case, there is one element per bucket, and running time is (kn). In the worst case, all elements end up in same bucket and running time is (n^2 ). Assuming evenly distributed keys, can use probability analysis to show average case running time is (kn) 2.1.2 Disadvantages: Bucket Sort There are quite a lot of operations in Bucket Sort. The memory access in the loop in which the elements are allocated to their buckets is unstructured. In case of large amount of numbers to be sorted this causes significant cache faults and is not faster than quick sort. 2.2 Radix Sort Radix Sort is a fast and perceptive sorting algorithm. Radix Sort puts the elements in the order of the digits at corresponding place of the numbers/strings. Radix sort solves the problem of card sorting counter intuitively by sorting on the least significant digit first. The cards are then combined into a single deck, with the cards in the 0 bin preceding the cards in the 1 bin preceding the cards in the two bin, and so on. The entire deck is sorted again on the second least significant digit and recombined in a like manner. The process continues until the cards have been sorted on all d digits. Remarkably, at that point the cards are fully sorted on the d-digit number. Thus, only d passes through the deck are required to sort. 2.2.1Analysis (Radix sort) Radix Sort is very simple and is in fact one of the fastest sorting algorithms for numbers or strings of letters Table 1: Analysis of Radix Sort Time Memory O(N) O(N) O(N) O(N) 11N+K 27N+K 37N+K 257N+K

Data Type Integers Alphabetical Alphanumeric String .

Stable Yes Yes Yes Yes

Where k = constant number of different symbols, that data element is comprised of. 2.2.2 Disadvantages: (Radix sort) Still, there are some trade offs for Radix Sort that can make it less preferable than other sorts. Radix Sort takes more space than other sorting algorithms, since in addition to the array that will be sorted, it needs to have a sublist for each of the possible digits or letters. If the list is of pure English words, you will need at least 26 different sublists, and if the list to be sorted is alphanumeric words or sentences, more than 36 sublists are needed, but if it is the strings of all possible ASCII symbols then the number of Sub lists required is 256. Hence as k increases the space complexity increases. Although there is another technique available for Radix sort to optimize space complexity by implementing a dynamic queue, each pass of Radix sort becomes sufficient longer. Creation and manipulation of each bucket is not so economic because of the overhead of allocation and de-allocation of memory for dynamic queue [1,3]. For many programs that need a fast sort, Radix Sort is a good choice. But the above reason avoids Radix sort to be used in certain applications. 2.3 Counting Sort Counting Sort [1] works by counting the occurrences of each data value. It assumes that there are n data items in the range of 1.k, for any integer k. The algorithm can then determine, for each input element x, the number of elements less than x. This information can be used to place directly into its correct position. 2.3.1 Analysis: Counting Sort The time complexity of Counting sort is O(k+ n). The Counting sort is a stable sort.

www.iosrjournals.org

31 | Page

T-Sort: A tagging technique to optimize memory utilization in radix sort III. T-Sort
T-Sort is a smart & perceptive sorting algorithm. It organizes the elements in the order of the digits present at corresponding place of the numbers/strings. Following is the algorithm with an example. Consider the following 5 numbers: List= {443,124,232,431,132}

Figure 3.1 Fig. 3.1 shows the status of arrays before sorting. T-Sort works in similar technique as Radix sort by sorting the numbers according to its place value that is first the least significant digit, then second significant digit and so on till the last significant digit d. Unlike Radix sort which puts the elements in k different buckets, T-Sort lets them remain in the original list and makes the sequence of preceding elements using a tag. Then as Radix sort collects elements from different buckets for next pass, Put sort collects them according to the sequence made in the tag field of the same input array. After the last pass it results in sorted list. The elements of array FO and LO are set to -1 after pass. In the given example the sort runs for 3 passes, since the number of digits in the largest digit available is 3. Algorithm: Advanced-Radix Sort (List1, List2,d, k) Step 0: initialize pass=1 Repeat step-1 to step-7 while pass<= d Step 1: for link=0 to k FO[link] =LO[link] =-1 Step 2: for index = 0 to N-1 repeat step3 to step4 Step 3: Find if ( pass%2 !=0) :digit=(int)(List1[index]/ pow(10,pass-1))%10 (1) otherwise :digit=(int)(List2[index]/ pow(10,pass-1))%10 (2) endif Step 4: Check if LO[digit] = -1 then :FO[digit] =LO[digit] =index (3) Otherwise :tag [LO[digit]] =index (4) :LO[digit] =index (5) End if Step 5: [Collect elements from List1 to List2 or from List2 to List1 depending on the pass number] for link = 0 to k repeat step-6 to step-7 :k=0 (6) Step 6: if FO[link]! = -1 then index=FO[link] (7) if(pass%2!=0) then :List2[k] =List1 [index] (8) else www.iosrjournals.org 32 | Page

T-Sort: A tagging technique to optimize memory utilization in radix sort


:List1[k]=List2[index] k++; Step 7: new_tag=tag [index] while new_tag != -1 do cur_index=new_tag if(pass%2 != 0) :List2[k]=List1[cur_index] Else :List1[k]=List2[cur_index] k++ new_tag=tag [cur_index] End while. Step 8: for i=0 to N-1 tag[i]=-1 Step9: Stop (9)

(10) (11) (12) (13)

(14)

After pass-1: in Fig.3.2 showing the status of List1,FO,LO and Collected elements in List2. Now List2[N] will be input array to pass-2.Status of List1,FO,LO after pass-2 is shown in Fig. 3.2. Now List1[N] will be input array to pass-3.Status of List2,FO,LO after pass-3 is shown in Fig. 3.3. As shown In Fig. 3.4 the final collection of sorted number is in List1[N] or List2[N] depends on the of number of passes. The collection after d number of passes results the sorted list. The values are populated in tag field based on the list sorted. When a pass is odd, the values in List1 are sorted and loaded to List 2 and vice versa when the pass is even.

Figure 3.2 The above figure states the status of the arrays after Pass1.After Pass 1, the numbers have been sorted and stored in List2 with respect to their least significant digit. Let us see the status after the pass2 in 3.3.

Figure 3.3 Fig. 3.3 shows the status of arrays after pass 2.After Pass 2, the numbers have been sorted with respect to their second last digit in List1. www.iosrjournals.org 33 | Page

T-Sort: A tagging technique to optimize memory utilization in radix sort

Figure 3.4 Fig. 3.4 shows the number sorted in List2 at the end of Pass-3

IV.

Analysis: (T-Sort)

4.1 Time complexity: (analysis) As per the algorithm of T-Sort .The loop of step 0 takes O(d) times, this includes step1 through step7. The loop of step1 takes O(k) time The loop of step2 takes O(n) time The loop of step5 takes O(n) time The loop of step8 takes O(n) time Therefore, total time required for the T-Sort is: O(d)( O(k)+O(n)+O(n) +O(n)) = O(d(n + k)) T-Sort makes d number of passes over the data set. This repeats setp1 to step 8. Each pass is taking O(n+k) time. Hence the total computing time is O(d(n+k)). Where n: is the number of elements in the list. d: the value of d depends on the number of digits in the largest key. k: is the number of symbols constituting the elements. 4.2 Space Complexity (analysis) The user defined data type that can contain the N number of records to be sorted along with one tag for each. The tag can be neglected in comparison to the record. But when record size is very small like a number only, the tag takes the same memory as the record. Hence that will be its worst case. User defined Data type used: typedef struct Num { int Record1[N]; Int Record2[N]; int tag[N]; } Number; Where N is the number of elements. The Memory Used in the Program: In worst case when the record to be sorted is very small like only a number. Number X. i.e. = 3N, but in average it is 2N when tag space is negligible. LO[k], FO[k]; are two fixed size very small array where k= number of symbols constituting the Key to sort. Hence can be neglected. Example: in case of numbers k=10 in case of alphabets k=26 Therefore the total memory space used is in average 2N in worst case 3N I.e. O (N)

www.iosrjournals.org

34 | Page

T-Sort: A tagging technique to optimize memory utilization in radix sort


4.3 Stability Like the Radix sort T-Sort algorithm does not alter the input sequence of any two equal elements,hence poses the property of stability.

V.

Experimental Results

Observation in a system with 2.1 GHz Intel processor, 3 GB RAM,320 GB hard disk. For a randomly generated list of N numbers, the number of iterations required for sorting is observed as follows. Number of input elements 10 50 100 1000 10000 15850 RADIX sort (KB) 184 268 280 316 1296 1336 T-Sort (KB) 180 184 188 204 536 556

Table 2: Experimental Results When log (N) < log (Dlarge (N)) results a big heap of memory wastage in Counting sort. i.e. space complexity is asymptotically increases in counting sort. Let w1 = log (N), w2= log (Dlarge (N)) then the space complexity of Counting sort becomes O(10w2) and the wastage is O(10w2) O(10w1), hence as w2-w1 increases the wastage of memory space. Whereas T-Sort has the space complexity is only O(2N).

VI.
Comparing the sorting techniques. Counting Sort Time Space Data type Stable N Case Dependent Integer Yes

Comparison
T-Sort N 2xN Integer/String Yes

Table 3: Comparison RADIX sort N KxN K=11,27,257 Integer/String Yes

VII.

Conclusion

We have introduced an improvement in memory utilization of Radix and Counting sort by developing a new sorting technique T-Sort which is a smart and perspective sorting algorithm. T-Sort is having all advantage of radix sort and eliminating its disadvantages. Experimental result shows the reliability and efficiency of T-Sort over Radix and Counting sort in terms of memory utilization. Future work focuses on implementing sorting simulator for all data types ,which will be highly effective in terms of memory access time and implementing TSort in cell phones.

References
Books:
[1] Aaron M. Tanenbaum,Data structure using C & C++(Second Edition,Prentice Hall) [2] Thomas H. Coremen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein,Introduction to Algorithms(2011, Prentice-Hall India Pvt Ltd.) [3] Horowitz, Sahni, Anderson-Freed,Fundamentals of Data structures in C(Second Edition,Universities Press India Limited)

www.iosrjournals.org

35 | Page

Você também pode gostar