Advanced Data Research Paper

Advanced Data Structures
Saurabh Pandey
Department of Computer Application
Azad Institute of Engineering & Technology
Lucknow, India
e-mail: saurabhpandey7@indiatimes.com
Abstract— This paper investigates how to maintain an efficient and communication. We hope to give a cleaner and more
dynamic ordered set of bit strings, which is an important modern view than you might have seen before.
problem in the field of information search and information • Data structures for range queries (and other geometric
processing. Generally, a dynamic ordered set is required to problems) and problems on trees. There are surprising
support 5 essential operations including search, insertion, equivalences between such problems, as well as interesting
deletion, max-value retrieval and next-larger-value retrieval. solutions.
Based on previous research fruits, we present an advanced
data structure named rich binary tree (RBT), which follows
both the binary-search-tree property and the digital-search-
tree property. Also, every key K keeps the most significant
difference bit (MSDB) between itself and the next larger value
among K’s ancestors, as well as that between itself and the next
smaller one among its ancestors. With the new data structure,
we can maintain a dynamic ordered set in O(L) time. Since
computers represent objects in binary mode, our method has a
big potential in application. In fact, RBT can be viewed as a
general-purpose data structure for problems concerning order,
such as search, sorting and maintaining a priority queue. For
example, when RBT is applied in sorting, we get a linear-time
algorithm with regard to the key number and its performance
is far better than quick-sort. What is more powerful than
Figure 1: Structure in lab view
quick-sort is that RBT supports constant-time dynamic Data structures optimized for external memory, and
insertion/deletion. cache-oblivious data structures. Any problem (e.g., sorting,
priority queues) is different when you're dealing with disk
Keywords- - information processing - dynamic ordered set - instead of main memory, or you care about cache
algorithms and data structures - rich binary tree Introduction. performance. Memory hierarchies have become important in
practice because of the recent escalation in data size.
I. INTRODUCTION
I. COMPLEXITY OF ALGORTHIM
Data structures play a central role in modern computer
science. You interact with data structures much more often An algorthim is a sequence of steps that gives method of
than with algorithms (think of Google, your mail server, and solving a problem.It creates the logic program.Efficency of
even your network routers). In addition, data structures are alogrithm depends upon on two major criteria,first one is run
essential building blocks in obtaining efficient algorithms. time of that algorthim and second is the space.Run time of
This course will cover major results and current directions of algorthim means the time taken by program for
research in data structures: exeucation.complexity tends to be used to characterize
• Classic comparison-based data structures. The area something with many parts in intricate arrangement. The
is still rich with open problems, such as whether there is a study of these complex linkages is the main goal of network
single best (dynamically optimal) binary search tree. theory and network science. In science[1] there are at this
• Dynamic graph problems. In almost any network, a time a number of approaches to characterizing complexity,
link's availability and speed are anything but a constant, many of which are reflected in this article. In a business
which has led to a re-evaluation of the common context, complexity management is the methodology to
understanding of graph problems: how to maintain essential minimize value-destroying complexity and efficiently
information such as a minimum-weight spanning forest control value-adding complexity in a cross-functional
while the graph changes.
• Integer data structures: beating the O(lg n) barrier in
sorting and searching. If you haven't seen this before, beating
O(lg n) may come as a surprise. If you have seen this before,
you might think that it's about a bunch of grudgy bit-tricks.
In fact, it is about fundamental issues regarding information
approach. much time might be needed in the worst case to guarantee
that the algorithm will always finish on time.
Average performance and worst-case performance are
the most used in algorithm analysis. Less widely found is
best-case performance, but it does have uses: for example,
where the best cases of individual tasks are known, they can
be used to improve the accuracy of an overall worst-case
analysis.
The term best-case performance is used in computer
science to describe the way an algorithm behaves under
optimal conditions. For example, the best case for a simple
linear search on a list occurs when the desired element is the
first element of the list.
Figure 2: Web Mining Process Worst-case performance analysis and average case
II. CHOOSING AN ALOGORITHM performance analysis have some similarities, but in practice
usually require different tools and approaches.
• For every problem there is a multitude of algorithms On the other hand some algorithms like hash tables have
that solve the problem. So youhave a choice of algorithms to very poor worst case behaviours, but a well written hash
code up as programs. table of sufficient size will statistically never give the worst
case; the average number of operations performed follows an
• If a program is likely to be used only once on a small exponential decay curve, and so the run time of an operation
amount of data, then you should select the algorithm that is is statistically bounded.
easiest to implement. Code it up correctly, run it and move Linear search on a list of n elements. In the worst case,
on to something else. the search must visit every element once. This happens when
the value being searched for is either the last element in the
• But if the program will be used many times and has a list, or is not in the list. However, on average, assuming the
lifetime that makes maintenance likely, then other factors value searched for is in the list and each list element is
come into play including readability, extensibility, equally likely to be the value searched for, the search visits
portability, reusability, ease of use and efficiency. It is only n/2 elements.
efficiency that we be looking at in this part of the module. Quicksort applied to a list of n elements, again assumed
to be all different and initially in random order. This popular
sorting algorithm has an average-case performance of O(n
III. BEST ,WORST AND AVERAGE CASE
log n), which contributes to making it a very fast algorithm
In computer science, best, worst and average cases of a in practice. But given a worst-case input, its performance
given algorithm express what the resource usage is at least, degrades to O(n2).
at most and on average, respectively. Usually the resource
being considered is running time, but it could also be IV. SORTING ALGORITHM
memory or other resources. In computer science and mathematics, a sorting
algorithm is an algorithm that puts elements of a list in a
certain order. The most-used orders are numerical order and
lexicographical order. Efficient sorting is important to
optimizing the use of other algorithms (such as search and
merge algorithms) that require sorted lists to work correctly;
it is also often useful for canonical data and for producing
human-readable output.
The output is in non decreasing order (each element is no
smaller than the previous element according to the desired
total order).
since the dawn of computing, the sorting problem has
attracted a great deal of research, perhaps due to the
complexity of solving it efficiently despite its simple,
familiar statement. For example, bubble sort was analyzed as
early as 1956.Although many consider it a solved problem,
useful new sorting algorithms are still being invented (for
example, library sort was first published in 2004). Sorting
algorithms are prevalent in introductory computer science
classes, where the abundance of algorithms for the problem
provides a gentle introduction to a variety of core algorithm
concepts, such as big O notation, divide-and-conquer
In real-time computing, the worst-case execution time is algorithms, data structures, randomized algorithms, best,
often of particular concern since it is important to know how
worst and average case analysis, time-space trade offs, and A red-black tree is a binary search tree where each node
lower bounds. has a color attribute, the value of which is either red or black.
In addition to the ordinary requirements imposed on binary
search trees, the following requirements apply to red-black
trees:
A node is either red or black.
The root is black. (This rule is sometimes omitted from
other definitions. Since the root can always be changed from
red to black but not necessarily vice-versa this rule has little
effect on analysis.)
All leaves are black.
Both children of every red node are black.
Every simple path from a given node to any of its
descendant leaves contains the same number of black nodes.
Time complexity
in big O notation
Average Worst case
Space O(n) O(n)
Search O(log n) O(log n)
Insert O(log n) O(log n)
Delete O(log n) O(log n)
Figure 3: Bubble Sort VI. EULER TOUR TREE
Dynamic trees, also known as link-cut trees. Link-cut

V. RED BLACK TREE trees are able to represent a dynamic forest of rooted trees in
O(log n) amortized time per operation. examining a simpler,
although not strictly better, alternative to link-cut trees
A red-black tree is a type of self-balancing binary search
known as Euler tour trees. We will then use Euler tour trees
tree, a data structure used in computing science, typically
to achieve dynamic connectivity in general graphs in O(log2
used to implement associative arrays. The original structure
n) time. Finally we will survey some of what is and is not
was invented in 1972 by Rudolf Bayer[1] and named
known for dynamic graphs.
"symmetric binary B-tree," but acquired its modern name in
a paper in 1978 by Leonidas J. Guibas and Robert
Euler Tour trees are due to Henzinger and King [1] and
Sedgewick.[2] It is complex, but has good worst-case
are an alternative to link-cut trees for representing dynamic
running time for its operations and is efficient in practice: it
trees. Euler tour trees are simpler and easier to analyze than
can search, insert, and delete in O(log n) time, where n is
link-cut trees, but do not naturally store aggregate
total number of elements in the tree. Put very simply, a red-
information about paths in the tree. Euler tour trees are well
black tree is a binary search tree that inserts and removes
suited for storing aggregate information on subtrees, which is
intelligently, to ensure the tree is reasonably balanced.. The
a feature we will use in the next section. The idea behind
resulting algorithm is named as ‘Weighted Page Rank’[9].
Euler Tour trees is to store the Euler tour of the tree. In an
A red-black tree is a special type of binary tree, used in arbitrary graph, an Euler tour is a path that traverses each
computer science to organize pieces of comparable data, edge exactly once. For trees we say that each edge is
such as text fragments or numbers. Red-black trees, like all bidirectional, so the Euler tour of a tree is the path through
binary search trees, allow efficient in-order traversal in the the tree that begins at the root and ends at the root, traversing
fashion, Left-Root-Right, of their elements. The search-time each edge exactly twice | once to enter the subtree, and once
results from the traversal from root to leaf, and therefore a to exit it. The Euler tour of a tree is essentially the depth
balanced tree, having the least possible tree height, results in traversal of a tree that returns to the root at the end.
O(log n) search time.
input, are cited in most algorithms going quadratic in real-
life applications.3 No matter how hard implementers try,
they cannot (without great sacrifice of speed) defend against
all inputs. This note describes an adversarial method that
finds chinks in the defenses of any implementation. A
polymorphic implementation of quicksort, such as the
standard C function qsort, never looks at the data. It relies
instead on an externally supplied comparison function. And
that allows us to monitor and influence the program’s
progress noninvasively. To do so we make a comparison
function that observes the pattern of comparisons and
constructs adverse data on the fly. Recall that quicksort sorts
a sequence of n data items in three phases:
1. Pick a data item as pivot.We assume that this phase
Figure 5:The correspondence between a tree and its Euler uses O(1) comparisons.
tour 2. Partition the data into three parts that respectively
contain all items less than the pivot, the pivot item itself, and
An Euler-Tour tree supports the following operations: all items greater than the pivot. The placement of items equal
FindRoot(v) Find the root of the tree containing node v. In to the pivot varies among implementations.
the Euler tour of a tree, the root is Visited rest and last. quicksort( void *a, int low, int high )
Therefore we simply return the minimum or maximum {
element in the BST. int pivot;
Cut(v) Cut the subtree rooted at v from the rest of the tree. /* Termination condition! */
Note that the Euler tour of v's subtree is a contiguous if ( high > low )
subsequence of visits that starts and ends with v, contained in {
the sequence of visits for the whole tree. To cut the subtree pivot = partition( a, low, high );
rooted at v, we may simply split the BST before its rst and quicksort( a, low, pivot-1 );
after its last visit to v. This splitting gives us the Euler tour of
the tree before reaching v, the tour of v's subtree, and the tour quicksort( a, pivot+1, high );
of the tree after leaving v. Concatenating the rst and last }
pieces together, and possibly deleting one redundant }
visitation between the end of the rst piece and beginning of
the last, gives us our answer.
Link(u; v) Insert u's subtree as a child of v. In the resulting Quicksort is a divide and conquer algorithm. Quicksort
Euler tour, we need to traverse u's subtree immediately after first divides a large list into two smaller sub-lists: the low
and immediately before visits to v. Therefore we will split v's elements and the high elements. Quicksort can then
traversal before the last visit to v, and then concatenate onto recursively sort the sub-lists.
the left piece a singleton visit to v, followed by the Euler tour The steps are:
of u's subtree, followed by the right piece. Pick an element, called a pivot, from the list.
Each of these operations performs a constant number of Reorder the list so that all elements with values less than
search, split, and merge operations on the Euler tour tree. the pivot come before the pivot, while all elements with
Each of these operations takes O(log n) per operation on a values greater than the pivot come after it (equal values can
balanced BST data structure. Consequently, the total running go either way). After this partitioning, the pivot is in its final
time for Euler-Tour trees is O(log n) per operation. If we use position. This is called the partition operation.
a B-tree with fanout of (log n) instead of a balanced BST in Recursively sort the sub-list of lesser elements and the
our Euler tour trees, we can achieve O(log n= log log n) sub-list of greater elements.we boost the weights of links in
searches (from the depth of the tree) and O(log2 n= log log whose anchor - a window of a fixed width - query terms
n) updates (from the depth times the branching factor). occur.
AVERAGE COMPLEXITY:-
QUICK SORT Even if pivots aren't chosen randomly, quicksort still
requires only time over all possible permutations of its
input. Because this average is simply the sum of the times
Quicksort can be made to go quadratic by constructing input over all permutations of the input divided by n factorial, it's
on the fly in response to the sequence of items compared.The equivalent to choosing a random permutation of the input.
technique is illustrated by a specific adversary for the When we do this, the pivot choices are essentially random,
standard C qsort function. The generalmethod works against leading to an algorithm with the same running time as
any implementation of quicksort–even a randomizing one– randomized quicksort. More precisely, the average number
that satisfies certain very mild and realistic assumptions. of comparisons over all permutations of the input sequence
When using quicksort one often feels a nagging tension: can be estimated accurately by solving the recurrence
suppose it goes quadratic? Tactics to avoid embarrassing relation:
results in some low-entropy cases, such as already ordered
Here, n − 1 is the number of comparisons the partition uses.
Since the pivot is equally likely to fall anywhere in the sorted
list order, the sum is averaging over all possible splits.
This means that, on average, quicksort performs only about
39% worse than the ideal number of comparisons, which is
its best case. In this sense it is closer to the best case than the
worst case. This fast average runtime is another reason for
quicksort's practical dominance over other sorting
algorithms. The disadvantage of the simple version above is
that it requires O(n) extra storage space, which is as bad as
merge sort. The additional memory allocations required can
also drastically impact speed and cache performance in
practical implementations. There is a more complex version
which uses an in-place partition algorithm and can achieve
the complete sort using O(log n). The outline of a formal
proof of the O(nlogn) expected time complexity follows.
Assume that there are no duplicates as duplicates could be
handled with linear time pre- and post-processing, or
considered cases easier than the analyzed. Choosing a pivot,
uniformly at random from 0 to n − 1, is then equivalent to
choosing the size of one particular partition, uniformly at
random from 0 to n − 1.
Figure 8. Quick sort
search time. The operations we are allowed to do are to

copy a finger, follow a pointer to a node, and to retrieve or
VII. CONCLUSION change the contents of a node’s fields. So, the BST model
allows for a subset of the data structures allowable in the
Various algorithms and sorting used in advanced data pointer-machine model. We will focus on data structures
struture to arrange the data in memory. Algorithm analysis in this model that solve the dynamic connectivity problem.
is important in practice because the accidental or
unintentional use of an inefficient algorithm can
significantly impact system performance. In time-sensitive VIII. ACKNOWLEGMENT
applications, an algorithm taking too long to run can
render its results outdated or useless. Quicksort is a space- I would like to acknowledge and extend my sincere
optimized version of the binary tree sort. Instead of gratitude to the following persons who have made the
inserting items sequentially into an explicit tree, quicksort completion of this Research possible: Head of the
organizes them concurrently into a tree that is implied by Department of Computer Science and Engineering Mr.
the recursive calls. The algorithms make exactly the same Shafeeq Ahmad. Asst. Prof. Mr. Hemant Kumar Singh, for
comparisons, but in a different order. The most direct providing me with timely guidance and motivation and all
competitor of quicksort is heapsort. Heapsort's worst-case Computer Science and Engineering Department faculty
running time is always . But, heapsort is assumed to be on members and Staff.
average somewhat slower than quicksort. This is still
debated and in research, with some publications indicating REFERENCES
the opposite. In Quicksort remains the chance of worst
case performance except in the introsort variant, which
switches to heapsort when a bad case is detected. If it is [1] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. &
known in advance that heapsort is going to be necessary, Stein, Clifford (2001). Introduction to Algorithms. Chapter 1:
Foundations (Second ed.). Cambridge, MA: MIT Press and
using it directly will be faster than waiting for intro sort to McGraw-Hill. pp. 3–122. ISBN 0262032937.
switch to it. Red-black trees, like all binary search trees,
[2] Sedgewick, Robert (1998). Algorithms in C, Parts 1-4:
allow efficient in-order traversal in the fashion, Left-Root- Fundamentals, Data Structures, Sorting, Searching (3rd ed.).
Right, of their elements. The search-time results from the Reading, MA: Addison-Wesley Professional. ISBN
traversal from root to leaf, and therefore a balanced tree, 9780201314526.
having the least possible tree height, results in O(log n)
[3] Knuth, Donald. The Art of Computer Programming. Addison- [10] ^ J. M. Zayed, N. Nouvel, U. Rauwald, O. A. Scherman, Chemical
Wesley. Complexity – supramolecular self-assembly of synthetic and
[4] Greene, Daniel A.; Knuth, Donald E. (1982). Mathematics for the biological building blocks in water, Chemical Society Reviews,
Analysis of Algorithms (Second ed.). Birkhäuser. ISBN 2010, 39, 2806–2816
374633102X. http://pubs.rsc.org/en/Content/ArticleLanding/2010/CS/b922348g
[5] Goldreich, Oded (2008). Computational Complexity: A [11] ^ Weaver, Warren (1948). "Science and Complexity". American
Conceptual Perspective. Cambridge University Press. ISBN Scientist 36 (4): 536. PMID 18882675.
9780521884730. http://www.ceptualinstitute.com/genre/weaver/weaver-1947b.htm.
Retrieved 2007-11-21
[6] Mathworld: Red-Black Tree
[12] ^ R.E. Tarjan, U. Vishkin: "Finding biconnected components and
[7] San Diego State University: CS 660: Red-Black tree notes, by computing tree functions in logarithmic parallel time"
Roger Whitney
[13] A. LaMarca and R. E. Ladner. "The Influence of Caches on the
[8] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Performance of Sorting." Proceedings of the Eighth Annual ACM-
Clifford Stein. Introduction to Algorithms, Second Edition. MIT SIAM Symposium on Discrete Algorithms, 1997. pp. 370–379.
Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Chapter 13:
Red-Black Trees [14] Faron Moller. Analysis of Quicksort. CS 332: Designing
Algorithms. Department of Computer Science, Swansea
[9] Brin, Sergey; Page, Lawrence; “The Anatomy of a Large-Scale University.
Hypertextual Web Search Engine;” 7th Int. WWW Conf.
Proceedings, Brisbane, Australia; April 1998. Conrado Martínez and Salvador Roura, Optimal sampling
strategies in quicksort and quickselect. SIAM J. Computing
31(3):683-705, 2001.

Advanced Data Research Paper

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Advanced Data Research Paper

Enviado por

Direitos autorais:

Formatos disponíveis

Advanced Data Structures

Dynamic trees, also known as link-cut trees. Link-cut

Figure 8. Quick sort

search time. The operations we are allowed to do are to

Você também pode gostar