Unit 1 Data Structure

Data Structures
Unit 1
General Concepts
Escuela Superior de Informtica de Ciudad Real

Universidad de Castilla-La Mancha
Objectives and competencies

UNIT 1: GENERAL CONCEPTS
Competencies
BA3 Capacidad para comprender y dominar los conceptos bsicos de matemtica
discreta, lgica, algortmica y complejidad computacional, y su aplicacin para la
resolucin de problemas propios de la ingeniera.
CO6 Conocimiento y aplicacin de los procedimientos algortmicos bsicos de las
tecnologas informticas para disear soluciones a problemas, analizando la idoneidad y
complejidad de los algoritmos propuestos.
CO7 Conocimiento, diseo y utilizacin de forma eficiente de los tipos y estructuras de
datos ms adecuados para la resolucin de un problema.
INS1 Capacidad de anlisis, sntesis y evaluacin.
INS4 Capacidad de resolucin de problemas aplicando tcnicas de ingeniera.
SIS1 Razonamiento crtico.
SIS3 Aprendizaje autnomo.
UCLM2 Capacidad para utilizar las Tecnologas de la Informacin y la Comunicacin.
Learning outcomes
Saber manejar tipos de datos, estructuras de datos y tipos abstractos de datos de forma
correcta y adecuada a los problemas, as como su especificacin formal, implementacin y
utilizacin de los tipos abstractos de datos lineales y no lineales.
Disear soluciones a problemas, analizando la idoneidad y complejidad de los
algoritmos propuestos
Escuela Superior de Informtica de Ciudad Real (UCLM)
References
Basic references
B. Meyer, Construccin de Software Orientado a Objetos, Captulo 6: Tipos
Abstractos de Datos. Segunda edicin, Prentice-Hall (1999)
M. T. Goodrich & R. Tamassia, Data Structures and Algorithms in Java,
Chapter 4 6th Edition. International Student Version, John Wiley (2011).
PowerPoint PDF Handouts:
http://bcs.wiley.com/hebcs/Books?action=chapter&bcsId=8950&itemId=1118808576&chapterId=1
01146
Specific references
J. V. Guttag, E. Horowitz and D. R. Musser, Abstract Data Types and the
Development of Data Structures, Communications of ACM, 20, 396-404 (1977)
J. V. Guttag, E. Horowitz and D. R. Musser, Abstract Data Types and Software
Validation, Communications of ACM, 21, 1048-1064 (1978)
Outline
1. Introduction
2. Data structures and Abstract Data Types (ADTs)
3. ADTs specification
3.1. UML based specification
3.2. Algebraic specification
3.2.1. Syntactic specification
3.2.2. Semantic specification
4. Analysis of algorithms
4.1. RAM model
4.2. Primitive operations
4.3. Big-Oh notation
4.4. Asymptotic analysis
5. Useful tips (reminder)
1. Introduction (I)
This is a course about what in computer science are called data
structures. What is that?
A data structure is a data container (it stores and supplies data).
A data structure is related to the concepts of data type and abstract
data type.
Remember, a data type represents a set of values (recall the
primitive data types).
A structured data is a set of variables of, possibly, different data
types.
An abstract data type involves a structured data plus a set of

operations available for it.
A data structure will correspond to an ADT.
1. Introduction (II)
Data structures are fundamental components of algorithms (which
are specifically considered in Programming
Methodology/Metodologa de la Programacin in the second
semester).
In this course, we will consider the most common data structures:
Data structures
Linear
Trees
Lists
Stacks
Nonlinear
Graphs
Queues
1. Introduction (III)
Very important: THIS COURSE IS NOT FOCUSED ON
PROGRAMMING.
Programming will be mainly a tool for implementing examples
using data structures.
The goal is not, specifically, to implement data structures.
In fact, most of the data structures are already available in standard
packages.
Lets consider in more detail both, data structures and abstract data
types.
2. Data structures and ADTs (I)

It is convenient to start considering the concept of Abstract Data
Type (ADT) introduced in the 1970s. So, whats an Abstract Data
Type?
Aho, Hopcroft & Ullman definition: An ADT is a mathematical
model, together with several operations defined on the model.
NIST (National Institute of Standards and Technology) definition:
ADT: A set of data values and associated

operations that are precisely specified independent
of any particular implementation.
In very practical terms, an ADT is a set of data (variables) and
functions (methods) acting on that data.
2. Data structures and ADTs (II)

An ADT is an abstract model. Then, how do we use an ADT in a
computer?
Data structures can be described as ADTs.
A data structure is a set of variables responding to the set of

methods that represent an ADT.
Remember: An ADT has a correspondence with the concept of
class in object orientation.
2. Data structures and ADTs (III)

From the object orientation standpoint, the data structure
corresponds to a class that implements the ADT methods.
In lay terms, a data structure is (or corresponds to) the practical
realization of an ADT.
Thus, to describe a data structure we must specify the
corresponding ADT.
10
2. Data structures and ADTs (IV)

Extremely important: The ADT (or the data structure) is defined by
its specification not by its implementation.
The key point is that the operations define the ADT. The ADT is NOT
defined by the data it contains.
Therefore, the way to work with an ADT is through its operations
(just by calling the methods in a class).
You NEVER work with an ADT (data structure) by accessing its data
because those data depend on the implementation used.
However, the operations do not depend on the implementation.
11
3. ADTs specification
How can we describe, specify, the behavior of an ADT?
The basic idea is to be able to describe the operations of the ADT
(remember, the ADT is defined by its operations not by its data).
There is not a single specification approach.

We can use different approaches: the syntax of a language, UML
notation, or, the traditional algebraic specification.
Lets present a simple specification based on UML.
12
3.1. UML based specification

To describe the behavior of the operations in the ADT we can use
the method specification notation of UML:
method1 (parameter1: type, parameter2: type,): return type
Thus, we can specify the data that the method acts on (the formal
parameters) and the result produced (the return type).
We should specify every method (operation) associated to the ADT.
However, the most complete (sometimes too complete)
specification method is the traditional algebraic specification.
13
3.2. Algebraic specification (I)

The algebraic specification was initially developed in the late 1970s
by groups in Europe and US.
The idea was to use algebraic specification as a formal
specification technique for abstract data types.
An algebraic data type specification consists of a syntactic and a
semantic specification.
The syntactic specification defines the names, domains and ranges

of the ADTs operations (how to use the operation: form).
The semantic specification contains a set of axioms in the form of
equations that relate the operations of the ADT to each other (what
the operation does: meaning).
14
3.2. Algebraic specification (II)

The syntactic specification corresponds to the previously
presented UML specification.
The semantic specification is needed for formal verification
techniques (to prove that the ADT does what it is supposed to do).
Since we are not performing formal verifications, in the rest of the
course we will restrict ourselves to the syntactic specification
(algebraic or other) of the considered ADTs.
15
3.2.1. Syntactic specification (I)

We will use algebraic notation (some authors use some kind of
programming notation).
In the syntactic specification (or simply syntax) we identify the type
and signature of the operations.
We must specify:
The ADT (as a general entity, for instance a stack, not a stack of
integers, or a stack of doubles)
Auxiliary types used in the specification
The operations. We define the operation using the concept (and
syntax) of functions. We specify the name of the operation, the set
to which each data belongs and the set the result belongs to.
Example: consider the addition (add) of natural numbers:
add: N x N N
add is the name of the operation, N represents the natural numbers,

x the Cartesian product, and the arrow () the result.
16
3.2.1. Syntactic specification (II)

The aspect of a syntactic specification for the stack ADT is:
Type: stack
Sorts: el (element), boolean, natural
Operations:
init
:
stack
push
: stack x el stack
pop
: stack
/ stack
top
: stack
/ el
isEmpty
: stack
boolean
size
: stack
natural
Partial
functions
Syntax
Partial functions: in some cases the operation is not defined

(Exception)
17
3.2.2. Semantic specification (I)

With the syntax we could have data structures with the same
specification but with different behavior (a stack and a queue, for
instance). The semantics breaks the ambiguity.
Here, in the semantics, we specify the behavior of each operation
using axioms.
Remember, in traditional logic, an axiom or postulate is a
proposition that is not proved or demonstrated but considered to
be self-evident. An axiom is a logical statement that is assumed to
be true.
For each operation, we define an axiom.
18
3.2.2. Semantic specification (II)

The aspect of a semantic specification is:
p stack, e element
Axioms:
isEmpty(init)
isEmpty(push(p, e))
true
false
top(init)
top(push(p, e))
error
e
size (init)
size (push(p, e))
0
size (p)+1
pop(init)
pop(push(p, e))
error
p
From now on we will focus just on the syntactic specification.
19
4. Analysis of algorithms (I)
Best case: Lower bound on cost.

Determined by easiest input.
Worst case: Upper bound on cost.
Determined by the most difficult
input.
Average case: Expected cost for a
random input. Need of a model for
defining what random input is.
Running Time
An algorithm can be analyzed in terms of

time efficiency or space utilization. We
will consider only the former right now.
The running time of an algorithm
typically grows with the input size.
Types of analysis
best case
average case
worst case
120
100
80
60
40
20
0
1000
2000
3000
4000
Input Size
Average case time is often difficult to

determine. We focus on the worst case
running time.
20
4. Analysis of algorithms (II)

Example: Comparing the growth of the running time, as the input
grows, to the growth of known functions.
15
25
125
32
33
100
10
10
100
664
104
106
1030
1000
1000
10
104
106
109
10300
10000
10000
13
105
108
1012
103000
Input Size (n)
log n n log n
10
10
100
Time efficiency: log n < n < n log n < n < n < 2

21
A picture is worth a thousand words
22
4. Analysis of algorithms (III)

To analyze an algorithm we have two options:
To perform an experimental study.
To perform a theoretical analysis.
Problems with the experimental approach:

It is necessary to implement the algorithm, which may be difficult.
Results may not be indicative of the running time on other inputs
not included in the experiment.
In order to compare two algorithms, the same hardware and
software environments must be used.
23
4. Analysis of algorithms (IV)

Advantages of the theoretical approach:
Only needs a high-level description of the algorithm instead of an
implementation.
Characterizes running time as a function of the input size, n.
Takes into account all possible inputs.
Allows us to evaluate the speed of an algorithm independently of
the hardware/software environment.
We present here the theoretical approach.

In it, we describe the algorithm using pseudo-code.
First of all, we need a model for computing the operations the
algorithm performs.
24
4.1. RAM model (I)

The Random Access Machine (RAM) model is a
simple model to quantify algorithms.
The model considers:

A single (sequential) CPU
A potentially unbounded bank of memory cells,
each of which can hold an arbitrary number or
character.
Memory cells are numbered and accessing any
cell in memory takes unit time.
The goal is counting operations, so we must

now consider what are the basic, primitive
operations we want to count.
25
4.1. RAM model (II)

What are the basic operations?
Basic computations performed by an algorithm
Identifiable in pseudo-code
Largely independent from the programming language
Exact definition not important (we will see why later)
Assumed to take a constant amount of time in the RAM model. The
time is the same for every operation.
26
4.1. RAM model (III)

Examples of basic operations:
Evaluating an expression
Assigning a value to a variable
Indexing into an array
Calling a method
Returning from a method
By inspecting the pseudo-code, we can determine the maximum

number of primitive operations executed by an algorithm, as a
function of the input size.
Lets see an example:
27
4.1. RAM model (IV)

Example: find maximum element of an array (the worst case is
when the maximum value is the last element of the array)
Algorithm arrayMax(A, n)
currentMax A[0]
for i 1 to n 1 do
if A[i] currentMax then
currentMax A[i]
{ increment counter i }
return currentMax
# operations
2
2n+1
2(n 1)
2(n 1)
2(n 1)
1
Total 8n 2
Best case? We never execute currentMax A[i]
Total 6n
28
4.3. Big-Oh notation (I)

Big-Oh notation characterizes functions according to their growth
rates (how fast the function grows).
Lets go back to the worst case scenario.
Given functions f(n) and g(n), we say that f(n) is O(g(n)) (f(n) is
order of g(n)) if there are positive constants c and n0 such that
10.000
3n
f(n) c g(n) for n n0
2n+
10
n
1.000
We say f(n) is big-oh g(n)

100
Example: 2n + 10 is O(n) since:

2n + 10 c n, then
(c 2) n 10, then
n 10/(c 2), then
Pick c = 3, then n0 = 10 and thats it
10
1
1
10
100
1.000
From n=10, the growth rate is the same for the two functions.
29
4.3. Big-Oh notation (II)

The big-Oh notation gives an upper bound (remember, the worst
case) on the growth rate of a function.
The statement f(n) is O(g(n)) means that the growth rate of f(n) is
no more than the growth rate of g(n). Your function does not grow
faster than g(n).
We can use the big-Oh notation to rank functions according to their
growth rate.
f(n) is O(g(n))
g(n) is O(f(n))
g(n) grows more
Yes
No
f(n) grows more
No
Yes
Same growth
Yes
Yes
30
4.4. Asymtotic analysis

Now, for comparing algorithms, we apply the Big-Oh notation to the
running time (# operations). This is called asymptotic analysis. With
this, the running time is expressed in big-Oh notation.
To perform the asymptotic analysis
We find the worst-case number of primitive operations executed as
a function of the input size, T(n).
We are interested in what happens with T(n) when n (the limit)
We express the resulting function with big-Oh notation.
Example:
We determine that algorithm arrayMax executes at most 8n 2
primitive operations.
T(n) = 8n when n
We say that algorithm arrayMax runs in O(n) time
Since constant factors and lower-order terms are eventually

dropped anyhow, we can disregard them when counting primitive
operations.
31
5. Useful tips (Reminder)

We collect here the concepts needed in this course from
Programming Fundamentals I and Programming Fundamentals
II.
Programming Fundamentals I (Fundamentos de Programacin I) :
Recursion
Programming Fundamentals II (Fundamentos de Programacin II) :
Class definition
Inheritance
Polymorphism
Interfaces
Exceptions
Generics
Linked list/variables
32
Recommended activities
Recommended readings
http://www.informatik.unibremen.de/agbkb/forschung/formal_methods/completed_projects/compass/7years_e.htm
http://en.wikipedia.org/wiki/Exponential_growth
http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem
Recommended activities
Invent some ADTs, and develop the corresponding specification.
Solve the unsolved problems from the list proposed.
Apply the asymptotic analysis to your own algorithms.
33

Unit 1 Data Structure

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Unit 1 Data Structure

Enviado por

Direitos autorais:

Formatos disponíveis

Data Structures

Escuela Superior de Informtica de Ciudad Real

Objectives and competencies

Escuela Superior de Informtica de Ciudad Real (UCLM)

Escuela Superior de Informtica de Ciudad Real (UCLM)

An abstract data type involves a structured data plus a set of

Escuela Superior de Informtica de Ciudad Real (UCLM)

Escuela Superior de Informtica de Ciudad Real (UCLM)

2. Data structures and ADTs (I)

ADT: A set of data values and associated

2. Data structures and ADTs (II)

A data structure is a set of variables responding to the set of

Escuela Superior de Informtica de Ciudad Real (UCLM)

2. Data structures and ADTs (III)

Escuela Superior de Informtica de Ciudad Real (UCLM)

2. Data structures and ADTs (IV)

Escuela Superior de Informtica de Ciudad Real (UCLM)

There is not a single specification approach.

Escuela Superior de Informtica de Ciudad Real (UCLM)

3.1. UML based specification

Escuela Superior de Informtica de Ciudad Real (UCLM)

3.2. Algebraic specification (I)

The syntactic specification defines the names, domains and ranges

Escuela Superior de Informtica de Ciudad Real (UCLM)

3.2. Algebraic specification (II)

Escuela Superior de Informtica de Ciudad Real (UCLM)

3.2.1. Syntactic specification (I)

add is the name of the operation, N represents the natural numbers,

Escuela Superior de Informtica de Ciudad Real (UCLM)

3.2.1. Syntactic specification (II)

Partial functions: in some cases the operation is not defined

3.2.2. Semantic specification (I)

Escuela Superior de Informtica de Ciudad Real (UCLM)

3.2.2. Semantic specification (II)

From now on we will focus just on the syntactic specification.

Escuela Superior de Informtica de Ciudad Real (UCLM)

4. Analysis of algorithms (I)

Best case: Lower bound on cost.

An algorithm can be analyzed in terms of

Average case time is often difficult to

4. Analysis of algorithms (II)

Input Size (n)

Time efficiency: log n < n < n log n < n < n < 2

A picture is worth a thousand words

Escuela Superior de Informtica de Ciudad Real (UCLM)

4. Analysis of algorithms (III)

Problems with the experimental approach:

Escuela Superior de Informtica de Ciudad Real (UCLM)

4. Analysis of algorithms (IV)

We present here the theoretical approach.

Escuela Superior de Informtica de Ciudad Real (UCLM)

4.1. RAM model (I)

The model considers:

The goal is counting operations, so we must

Escuela Superior de Informtica de Ciudad Real (UCLM)

4.1. RAM model (II)

Escuela Superior de Informtica de Ciudad Real (UCLM)

4.1. RAM model (III)

By inspecting the pseudo-code, we can determine the maximum

Escuela Superior de Informtica de Ciudad Real (UCLM)

4.1. RAM model (IV)

Escuela Superior de Informtica de Ciudad Real (UCLM)

4.3. Big-Oh notation (I)

f(n) c g(n) for n n0