Você está na página 1de 33

Data Structures

Unit 1
General Concepts

Escuela Superior de Informtica de Ciudad Real


Universidad de Castilla-La Mancha

Objectives and competencies


UNIT 1: GENERAL CONCEPTS
Competencies
BA3 Capacidad para comprender y dominar los conceptos bsicos de matemtica
discreta, lgica, algortmica y complejidad computacional, y su aplicacin para la
resolucin de problemas propios de la ingeniera.
CO6 Conocimiento y aplicacin de los procedimientos algortmicos bsicos de las
tecnologas informticas para disear soluciones a problemas, analizando la idoneidad y
complejidad de los algoritmos propuestos.
CO7 Conocimiento, diseo y utilizacin de forma eficiente de los tipos y estructuras de
datos ms adecuados para la resolucin de un problema.
INS1 Capacidad de anlisis, sntesis y evaluacin.
INS4 Capacidad de resolucin de problemas aplicando tcnicas de ingeniera.
SIS1 Razonamiento crtico.
SIS3 Aprendizaje autnomo.
UCLM2 Capacidad para utilizar las Tecnologas de la Informacin y la Comunicacin.
Learning outcomes
Saber manejar tipos de datos, estructuras de datos y tipos abstractos de datos de forma
correcta y adecuada a los problemas, as como su especificacin formal, implementacin y
utilizacin de los tipos abstractos de datos lineales y no lineales.
Disear soluciones a problemas, analizando la idoneidad y complejidad de los
algoritmos propuestos
Escuela Superior de Informtica de Ciudad Real (UCLM)

References
UNIT 1: GENERAL CONCEPTS

Basic references
B. Meyer, Construccin de Software Orientado a Objetos, Captulo 6: Tipos
Abstractos de Datos. Segunda edicin, Prentice-Hall (1999)
M. T. Goodrich & R. Tamassia, Data Structures and Algorithms in Java,
Chapter 4 6th Edition. International Student Version, John Wiley (2011).
PowerPoint PDF Handouts:
http://bcs.wiley.com/hebcs/Books?action=chapter&bcsId=8950&itemId=1118808576&chapterId=1
01146
Specific references
J. V. Guttag, E. Horowitz and D. R. Musser, Abstract Data Types and the
Development of Data Structures, Communications of ACM, 20, 396-404 (1977)
J. V. Guttag, E. Horowitz and D. R. Musser, Abstract Data Types and Software
Validation, Communications of ACM, 21, 1048-1064 (1978)

Escuela Superior de Informtica de Ciudad Real (UCLM)

Outline
UNIT 1: GENERAL CONCEPTS
1. Introduction
2. Data structures and Abstract Data Types (ADTs)
3. ADTs specification
3.1. UML based specification
3.2. Algebraic specification
3.2.1. Syntactic specification
3.2.2. Semantic specification
4. Analysis of algorithms
4.1. RAM model
4.2. Primitive operations
4.3. Big-Oh notation
4.4. Asymptotic analysis
5. Useful tips (reminder)

Escuela Superior de Informtica de Ciudad Real (UCLM)

1. Introduction (I)
This is a course about what in computer science are called data
structures. What is that?
A data structure is a data container (it stores and supplies data).
A data structure is related to the concepts of data type and abstract
data type.
Remember, a data type represents a set of values (recall the
primitive data types).
A structured data is a set of variables of, possibly, different data
types.

An abstract data type involves a structured data plus a set of


operations available for it.
A data structure will correspond to an ADT.
Escuela Superior de Informtica de Ciudad Real (UCLM)

1. Introduction (II)
Data structures are fundamental components of algorithms (which
are specifically considered in Programming
Methodology/Metodologa de la Programacin in the second
semester).
In this course, we will consider the most common data structures:
Data structures

Linear
Trees

Lists
Stacks

Nonlinear
Graphs

Queues

Escuela Superior de Informtica de Ciudad Real (UCLM)

1. Introduction (III)
Very important: THIS COURSE IS NOT FOCUSED ON
PROGRAMMING.
Programming will be mainly a tool for implementing examples
using data structures.
The goal is not, specifically, to implement data structures.
In fact, most of the data structures are already available in standard
packages.
Lets consider in more detail both, data structures and abstract data
types.

Escuela Superior de Informtica de Ciudad Real (UCLM)

2. Data structures and ADTs (I)


It is convenient to start considering the concept of Abstract Data
Type (ADT) introduced in the 1970s. So, whats an Abstract Data
Type?
Aho, Hopcroft & Ullman definition: An ADT is a mathematical
model, together with several operations defined on the model.
NIST (National Institute of Standards and Technology) definition:

ADT: A set of data values and associated


operations that are precisely specified independent
of any particular implementation.
In very practical terms, an ADT is a set of data (variables) and
functions (methods) acting on that data.
Escuela Superior de Informtica de Ciudad Real (UCLM)

2. Data structures and ADTs (II)


An ADT is an abstract model. Then, how do we use an ADT in a
computer?
Data structures can be described as ADTs.

A data structure is a set of variables responding to the set of


methods that represent an ADT.
Remember: An ADT has a correspondence with the concept of
class in object orientation.

Escuela Superior de Informtica de Ciudad Real (UCLM)

2. Data structures and ADTs (III)


From the object orientation standpoint, the data structure
corresponds to a class that implements the ADT methods.
In lay terms, a data structure is (or corresponds to) the practical
realization of an ADT.
Thus, to describe a data structure we must specify the
corresponding ADT.

Escuela Superior de Informtica de Ciudad Real (UCLM)

10

2. Data structures and ADTs (IV)


Extremely important: The ADT (or the data structure) is defined by
its specification not by its implementation.
The key point is that the operations define the ADT. The ADT is NOT
defined by the data it contains.
Therefore, the way to work with an ADT is through its operations
(just by calling the methods in a class).

You NEVER work with an ADT (data structure) by accessing its data
because those data depend on the implementation used.
However, the operations do not depend on the implementation.

Escuela Superior de Informtica de Ciudad Real (UCLM)

11

3. ADTs specification
How can we describe, specify, the behavior of an ADT?
The basic idea is to be able to describe the operations of the ADT
(remember, the ADT is defined by its operations not by its data).

There is not a single specification approach.


We can use different approaches: the syntax of a language, UML
notation, or, the traditional algebraic specification.
Lets present a simple specification based on UML.

Escuela Superior de Informtica de Ciudad Real (UCLM)

12

3.1. UML based specification


To describe the behavior of the operations in the ADT we can use
the method specification notation of UML:
method1 (parameter1: type, parameter2: type,): return type

Thus, we can specify the data that the method acts on (the formal
parameters) and the result produced (the return type).
We should specify every method (operation) associated to the ADT.
However, the most complete (sometimes too complete)
specification method is the traditional algebraic specification.

Escuela Superior de Informtica de Ciudad Real (UCLM)

13

3.2. Algebraic specification (I)


The algebraic specification was initially developed in the late 1970s
by groups in Europe and US.
The idea was to use algebraic specification as a formal
specification technique for abstract data types.
An algebraic data type specification consists of a syntactic and a
semantic specification.

The syntactic specification defines the names, domains and ranges


of the ADTs operations (how to use the operation: form).
The semantic specification contains a set of axioms in the form of
equations that relate the operations of the ADT to each other (what
the operation does: meaning).

Escuela Superior de Informtica de Ciudad Real (UCLM)

14

3.2. Algebraic specification (II)


The syntactic specification corresponds to the previously
presented UML specification.
The semantic specification is needed for formal verification
techniques (to prove that the ADT does what it is supposed to do).
Since we are not performing formal verifications, in the rest of the
course we will restrict ourselves to the syntactic specification
(algebraic or other) of the considered ADTs.

Escuela Superior de Informtica de Ciudad Real (UCLM)

15

3.2.1. Syntactic specification (I)


We will use algebraic notation (some authors use some kind of
programming notation).
In the syntactic specification (or simply syntax) we identify the type
and signature of the operations.
We must specify:
The ADT (as a general entity, for instance a stack, not a stack of
integers, or a stack of doubles)
Auxiliary types used in the specification
The operations. We define the operation using the concept (and
syntax) of functions. We specify the name of the operation, the set
to which each data belongs and the set the result belongs to.
Example: consider the addition (add) of natural numbers:
add: N x N N

add is the name of the operation, N represents the natural numbers,


x the Cartesian product, and the arrow () the result.

Escuela Superior de Informtica de Ciudad Real (UCLM)

16

3.2.1. Syntactic specification (II)


The aspect of a syntactic specification for the stack ADT is:
Type: stack
Sorts: el (element), boolean, natural
Operations:
init
:
stack
push
: stack x el stack
pop
: stack
/ stack
top
: stack
/ el
isEmpty
: stack
boolean
size
: stack
natural

Partial
functions

Syntax

Partial functions: in some cases the operation is not defined


(Exception)
Escuela Superior de Informtica de Ciudad Real (UCLM)

17

3.2.2. Semantic specification (I)


With the syntax we could have data structures with the same
specification but with different behavior (a stack and a queue, for
instance). The semantics breaks the ambiguity.
Here, in the semantics, we specify the behavior of each operation
using axioms.
Remember, in traditional logic, an axiom or postulate is a
proposition that is not proved or demonstrated but considered to
be self-evident. An axiom is a logical statement that is assumed to
be true.
For each operation, we define an axiom.

Escuela Superior de Informtica de Ciudad Real (UCLM)

18

3.2.2. Semantic specification (II)


The aspect of a semantic specification is:
p stack, e element
Axioms:
isEmpty(init)
isEmpty(push(p, e))

true
false

top(init)
top(push(p, e))

error
e

size (init)
size (push(p, e))

0
size (p)+1

pop(init)
pop(push(p, e))

error
p

From now on we will focus just on the syntactic specification.

Escuela Superior de Informtica de Ciudad Real (UCLM)

19

4. Analysis of algorithms (I)

Best case: Lower bound on cost.


Determined by easiest input.
Worst case: Upper bound on cost.
Determined by the most difficult
input.
Average case: Expected cost for a
random input. Need of a model for
defining what random input is.

Running Time

An algorithm can be analyzed in terms of


time efficiency or space utilization. We
will consider only the former right now.
The running time of an algorithm
typically grows with the input size.
Types of analysis

best case
average case
worst case
120
100
80
60
40
20
0

1000

2000

3000

4000

Input Size

Average case time is often difficult to


determine. We focus on the worst case
running time.
Escuela Superior de Informtica de Ciudad Real (UCLM)

20

4. Analysis of algorithms (II)


Example: Comparing the growth of the running time, as the input
grows, to the growth of known functions.

15

25

125

32

33

100

10

10

100

664

104

106

1030

1000

1000

10

104

106

109

10300

10000

10000

13

105

108

1012

103000

Input Size (n)

log n n log n

10

10

100

Time efficiency: log n < n < n log n < n < n < 2


Escuela Superior de Informtica de Ciudad Real (UCLM)

21

A picture is worth a thousand words

Escuela Superior de Informtica de Ciudad Real (UCLM)

22

4. Analysis of algorithms (III)


To analyze an algorithm we have two options:
To perform an experimental study.
To perform a theoretical analysis.

Problems with the experimental approach:


It is necessary to implement the algorithm, which may be difficult.
Results may not be indicative of the running time on other inputs
not included in the experiment.
In order to compare two algorithms, the same hardware and
software environments must be used.

Escuela Superior de Informtica de Ciudad Real (UCLM)

23

4. Analysis of algorithms (IV)


Advantages of the theoretical approach:
Only needs a high-level description of the algorithm instead of an
implementation.
Characterizes running time as a function of the input size, n.
Takes into account all possible inputs.
Allows us to evaluate the speed of an algorithm independently of
the hardware/software environment.

We present here the theoretical approach.


In it, we describe the algorithm using pseudo-code.
First of all, we need a model for computing the operations the
algorithm performs.

Escuela Superior de Informtica de Ciudad Real (UCLM)

24

4.1. RAM model (I)


The Random Access Machine (RAM) model is a
simple model to quantify algorithms.

The model considers:


A single (sequential) CPU
A potentially unbounded bank of memory cells,
each of which can hold an arbitrary number or
character.
Memory cells are numbered and accessing any
cell in memory takes unit time.

The goal is counting operations, so we must


now consider what are the basic, primitive
operations we want to count.

Escuela Superior de Informtica de Ciudad Real (UCLM)

25

4.1. RAM model (II)


What are the basic operations?
Basic computations performed by an algorithm
Identifiable in pseudo-code
Largely independent from the programming language
Exact definition not important (we will see why later)
Assumed to take a constant amount of time in the RAM model. The
time is the same for every operation.

Escuela Superior de Informtica de Ciudad Real (UCLM)

26

4.1. RAM model (III)


Examples of basic operations:

Evaluating an expression
Assigning a value to a variable
Indexing into an array
Calling a method
Returning from a method

By inspecting the pseudo-code, we can determine the maximum


number of primitive operations executed by an algorithm, as a
function of the input size.
Lets see an example:

Escuela Superior de Informtica de Ciudad Real (UCLM)

27

4.1. RAM model (IV)


Example: find maximum element of an array (the worst case is
when the maximum value is the last element of the array)
Algorithm arrayMax(A, n)
currentMax A[0]
for i 1 to n 1 do
if A[i] currentMax then
currentMax A[i]
{ increment counter i }
return currentMax

# operations
2
2n+1
2(n 1)
2(n 1)
2(n 1)
1

Total 8n 2
Best case? We never execute currentMax A[i]
Total 6n

Escuela Superior de Informtica de Ciudad Real (UCLM)

28

4.3. Big-Oh notation (I)


Big-Oh notation characterizes functions according to their growth
rates (how fast the function grows).
Lets go back to the worst case scenario.
Given functions f(n) and g(n), we say that f(n) is O(g(n)) (f(n) is
order of g(n)) if there are positive constants c and n0 such that
10.000

3n

f(n) c g(n) for n n0

2n+
10
n

1.000

We say f(n) is big-oh g(n)


100

Example: 2n + 10 is O(n) since:


2n + 10 c n, then
(c 2) n 10, then
n 10/(c 2), then
Pick c = 3, then n0 = 10 and thats it

10

1
1

10

100

1.000

From n=10, the growth rate is the same for the two functions.
Escuela Superior de Informtica de Ciudad Real (UCLM)

29

4.3. Big-Oh notation (II)


The big-Oh notation gives an upper bound (remember, the worst
case) on the growth rate of a function.
The statement f(n) is O(g(n)) means that the growth rate of f(n) is
no more than the growth rate of g(n). Your function does not grow
faster than g(n).
We can use the big-Oh notation to rank functions according to their
growth rate.

f(n) is O(g(n))

g(n) is O(f(n))

g(n) grows more

Yes

No

f(n) grows more

No

Yes

Same growth

Yes

Yes

Escuela Superior de Informtica de Ciudad Real (UCLM)

30

4.4. Asymtotic analysis


Now, for comparing algorithms, we apply the Big-Oh notation to the
running time (# operations). This is called asymptotic analysis. With
this, the running time is expressed in big-Oh notation.
To perform the asymptotic analysis
We find the worst-case number of primitive operations executed as
a function of the input size, T(n).
We are interested in what happens with T(n) when n (the limit)
We express the resulting function with big-Oh notation.

Example:
We determine that algorithm arrayMax executes at most 8n 2
primitive operations.
T(n) = 8n when n
We say that algorithm arrayMax runs in O(n) time

Since constant factors and lower-order terms are eventually


dropped anyhow, we can disregard them when counting primitive
operations.
Escuela Superior de Informtica de Ciudad Real (UCLM)

31

5. Useful tips (Reminder)


We collect here the concepts needed in this course from
Programming Fundamentals I and Programming Fundamentals
II.
Programming Fundamentals I (Fundamentos de Programacin I) :
Recursion

Programming Fundamentals II (Fundamentos de Programacin II) :

Class definition
Inheritance
Polymorphism
Interfaces
Exceptions
Generics
Linked list/variables

Escuela Superior de Informtica de Ciudad Real (UCLM)

32

Recommended activities
UNIT 1: GENERAL CONCEPTS
Recommended readings
http://www.informatik.unibremen.de/agbkb/forschung/formal_methods/completed_projects/compass/7years_e.htm
http://en.wikipedia.org/wiki/Exponential_growth
http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem
Recommended activities
Invent some ADTs, and develop the corresponding specification.
Solve the unsolved problems from the list proposed.
Apply the asymptotic analysis to your own algorithms.

Escuela Superior de Informtica de Ciudad Real (UCLM)

33

Você também pode gostar