Você está na página 1de 59

http://www.datagovernance.

com/

Unit 4: Data Management

Unit 5: Data Management

Wang,X., ENGO351

Data management:
Outline

Definition of a database

Database management systems

Conceptual data modelling

Logical data modelling

The entity-relationship model

The relational database model

The standard query language (SQL)

Unit 5: Data Management

Wang,X., ENGO351

Database
A database is a logical, structured collection of data about
things, their attributes and their relationships to each other

A database is a collection of related data.

Within a GIS, these things and relationships have a


spatial component

Unit 5: Data Management

Wang,X., ENGO351

Spatial Data Examples


Examples of non-spatial data
Names, UCID, grades of a student
Examples of Spatial data
Census Data
NASA satellites imagery - terabytes of data per day
Weather and Climate Data
Rivers, Farms
Medical Imaging
Exercise: Identify spatial and non-spatial data items in
A phone book
A cookbook with recipes

Unit 5: Data Management

Wang,X., ENGO351

Why use database?


Data Redundancy and Inconsistency - The same piece of
information may be duplicated in several different files.
Difficulty in Accessing Data: Conventional file processing
environments do not allow data to be retrieved in a convenient and
efficient manner.
Data Isolation: Data is scattered in a number of different files.
Linking these data to provide new information may require a
substantial amount of programming to answer each specific query.
Concurrent Access Problems: Multiple users attempting to access
the same file may lead to problems.
Security Problems: Certain users should not be allowed to access
certain data-sets and not be permitted to edit certain data that they
do not own.
Integrity Problems: Some data values should satisfy certain
consistency constraints.
Unit 5: Data Management

Wang,X., ENGO351

Database management system


A database management system (DBMS) is a collection
of software programs that facilitates the processes of
defining, construction, and manipulating the database for
various applications

A DBMS provides tools for data input, search, retrieval,


manipulation, and output

Most commercial GIS include database management


tools for local databases

ArcGIS uses Microsoft Access

Unit 5: Data Management

Wang,X., ENGO351

Data model
A data model is a description or view of the real world and
data modeling is the process that formalizes the description
or view at different levels of data abstraction

The real-world is made up of complex spatial objects and


phenomena

A data model tends to be tailored to a specific application or


problem context

Different users may have different data models

Unit 5: Data Management

Wang,X., ENGO351

Data models and levels of abstraction


Data modeling involves three steps corresponding to
increasing levels of abstraction:

conceptual modeling

logical modeling

physical modeling

Unit 5: Data Management

Wang,X., ENGO351

Conceptual model (1)

A conceptual
model represents
the users
perception of the
real world
Data abstraction
is strictly limited
to the description
of the information
content of the
users view of the
real world,
without any
concern for
computer
implementation
(Brown, 1997)
Unit 5: Data Management

Wang,X., ENGO351

Conceptual model (2)

The function of a conceptual data model is to provide the


necessary language to describe how we naturally
conceptualize data organization
Although conceptual data modeling is part of the database
design process, conceptual models are database-independent

A conceptual model provides a basic reference for users


who need to understand the structure of the data in the
system

Unit 5: Data Management

Wang,X., ENGO351

Conceptual model (3)

A conceptual data
model provides a
way to
communicate
between users,
designers and
computers

(Worboys, 2004)
Unit 5: Data Management

Wang,X., ENGO351

The entity-relationship model

The entity-relationship (E-R) data model is one of the


most commonly-used conceptual data models for GIS
data modelling

It is based on the concepts of:


entities
attributes

relationships

to represent real-world features, their properties and their


relationships

Unit 5: Data Management

Wang,X., ENGO351

The entity-relationship diagram

An entity-relationship diagram (E-R diagram) can be


used to express the features and properties of an E-R
model

(Lo and Yeung, 2007)

Unit 5: Data Management

Wang,X., ENGO351

Entity
An entity is a thing or object in the real world that is
distinguishable from all other objects

each person in a classroom is an entity

An entity has an independent existence

physical existence: a person, a car, a forest

conceptual existence: a company, a job, an agenda

Entities

can be thought of as nouns.

Examples:

Entities

a computer, an employee, a song, a mathematical theorem

are represented as rectangles.

Unit 5: Data Management

Wang,X., ENGO351

Relationships between entities

A relationship captures how two or more entities are


related to one another.
i.e. the relationship performs between the entity artist and song

Relationships can be thought of as verbs, linking two or


more nouns
Relationships are represented as diamonds, connected
by lines to each of the entities in the relationship.

Unit 5: Data Management

Wang,X., ENGO351

Attributes

Entity and relations can have attributes.


i.e. an employee entity might have an SIN attribute
i.e. entity City has attribute types name, population, density

The attributes are associated to each occurrence of the


entity type

Attributes are represented as ellipses connected to their


owning entity sets by a line.

Unit 5: Data Management

Wang,X., ENGO351

Relationship types (1)

Relationship types are subdivided into three categories:


one-to-one (1:1)
many-to-one (N:1)
many-to-many (M:N)

Participation constraint: entity occurrence may only exist


if it participates in a relationship.
One can have total and partial participation constraints

B
Double line m e ans B c a nnot exist
without A. A c an exist without B.

i.e. one director is the manager of at most one cinema (double


lines means mandatory)

(Worboys, 2004)
Unit 5: Data Management

Wang,X., ENGO351

Relationship types (2)

In a many-to-one relationship, many occurrences of one


entity type may have a relationship with at most one
occurrence of another entity type
i.e. many cinemas are located in a town

(Worboys, 2004)

Unit 5: Data Management

Wang,X., ENGO351

Relationship types (3)

In a many-to-many
relationship, many
occurrences of one
entity type may have a
relationship with many
occurrences of another
entity type
i.e. many roads connect
many cities

(Worboys, 2004, p. 58)


Unit 5: Data Management

Wang,X., ENGO351

Process of creating E-R model


1. Start with textual description
2. Identify entities (nouns)
3. Tabulate entities
4. Determine relationships
5. Determine cardinality ratio/participation constraints
6. Determine attributes & key attributes

Unit 5: Data Management

Wang,X., ENGO351

An Example: World Database


Conceptual Model
3 Entities: Country, City, River
2 Relationships: capital-of, originates-in
Attributes listed in the figure

Unit 5: Data Management

Wang,X., ENGO351

Logical model (1)

A logical data model represents the real world by means


of diagrams, lists, and tables designed to reflect the
recording of the data in terms of some formal language

Logical models are software dependent; they must be


expressed in terms of the language of a specific
database management system (i.e. relational model)

Unit 5: Data Management

Wang,X., ENGO351

Logical model (2)

This figure
illustrates the
logical model
that was
created from
the conceptual
model
previously
shown

(Brown, 1997)
Unit 5: Data Management

Wang,X., ENGO351

The relational database model (1)


The relational database model,
developed by Codd in the early 1970s, is
the most widely used logical database
model in the computer industry

It

represents the database as a collection


of tables (simple files) called relations.

Unit 5: Data Management

Wang,X., ENGO351

The relational database model (2)

(Aronoff, 1993)

Unit 5: Data Management

Wang,X., ENGO351

The relational database model (3)


Each table in the
database
represents an entity
type identified in
the data modeling
process

(Brown, 1997)
Unit 5: Data Management

Wang,X., ENGO351

Column and tuple


In a particular table,
each column represents an attribute

each row, called a tuple (or record), represents a collection of


data values associated to the occurrence of an entity

(Lo and Yeung, 2007)

Unit 5: Data Management

Wang,X., ENGO351

Relation

A relation can be simply thought of as a table of data

A relation is made up of a set of tuples


There can be any number of tuples in a relation
Within a relation, the logical order of the tuples is not
important

Unit 5: Data Management

Wang,X., ENGO351

Domain
A domain is a set of attribute values associated to each
column of a relation

A domain is identified by an attribute name

There are occasions when the values of some attributes


within a particular tuple are unknown or missing

a special value, called null (which is not zero), is assigned

Unit 5: Data Management

Wang,X., ENGO351

Primary and foreign keys (1)


Tables, in a relational database, are connected to each
other using keys
A primary key represents one or more attributes (columns
in the table) whose values can uniquely identify a record in
a table
Its counterpart in another table for the purpose of linkage
is called a foreign key

http://www.datagovernance.com/

Unit 5: Data Management

Wang,X., ENGO351

Primary and foreign keys (2)


Thus, a key
common to two
tables are used to
establish
connections
between
corresponding
records in the
tables

Unit 5: Data Management

Wang,X., ENGO351

Primary and foreign keys (3)


Keys are a simple
way to connect
tables within a
relational database

Collector

Unit 5: Data Management

Wang,X., ENGO351

Primary key
Attribute values of the primary key allow users to identify
individual tuples uniquely

e.g. SIN, an identification number

The primary key must not contain null values (Why?)

What if cannot use of a certain attribute alone in a relation


to identify the unique identification of a tuple?

use one or more additional attributes to form a compound key


create an additional attribute to hold a unique identifier for each
occurrence of an entity

Unit 5: Data Management

Wang,X., ENGO351

Foreign key
A foreign key is an attribute in a relation that is a primary
key in another relation

The identical values of the primary and the foreign keys


make it possible to logically link different tuples in different
relations

Unit 5: Data Management

Wang,X., ENGO351

Join and Relate


Two

common operations for linking tables in a relational


database are join and relate.
A

join operation brings together two tables by using a


key that is common to both tables.
A

relate operation temporarily connects two tables but


keeps the tables physically separate.

35

Figure 8.10 (Chang, 2012)


Primary key and foreign key provide the linkage to join the table on the
right to the feature attribute table on the left.

36

Figure 8.11 (Chang, 2012)


This example of a many-to-one relationship in the SSURGO database
relates three tree species in cotreestomng to the same soil component
in component.
37

Figure 8.12 (Chang, 2012)


This example of a one-to-many relationship in the SSURGO database
relates one soil map unit in mapunit to two soil components in
component.

38

Normalisation
Normalising is the splitting of the database into multiple
tables.
Two main reasons for normalising a database:
prevents unnecessary duplication of data, thus conserving time
and disk space, and in some cases, preventing errors.
makes it easier to extract exactly the information from the
database.

Each normal form is built on the normal form before it.


e.g. if a table is in 3-NF it is in 2_NF

Unit 5: Data Management

Wang,X., ENGO351

Normal Forms
1-NF: attributes are atomic or single valued.
2-NF: All non-primary attributes are fully dependent on
the primary key; no non-primary attributes are
functionally determined by a subset of the key
3-NF: No non-prime attributes are functionally
determined by another non-prime attribute; all nonprimary attributes are directly dependent on the primary
key.
Example: A company runs many projects. Employees work on
particular tasks on different projects. How do we invoice the time
each employee allocates to their different tasks on each project?

Unit 5: Data Management

Wang,X., ENGO351

Guidelines for relational database design


1. Create E-R model
2. Produce table for each entity
3. For 1:1 relationships,
add foreign key to either participating entities
For 1:M relationships,
add foreign key to entity on "Many" side of relationship
4. For N:M relationships, create a new table
5. Check design is "normalised"

Unit 5: Data Management

Wang,X., ENGO351

Advantages of the relational model

A relational database is simple and flexible

Each table in the database can be prepared, maintained,


and edited separately from other tables

The tables can remain separate until a query or an


analysis requires that attribute data from different tables be
linked together

Unit 5: Data Management

Wang,X., ENGO351

Physical model

A physical data model describes the physical storage of


the data in the computer by record formats and access
paths

It is hardware dependent and is concerned primarily with


the implementation details of a database

It is intended for the system programmer and database


manager, not the general user

(Lo and Yeung, 2007)

Unit 5: Data Management

Wang,X., ENGO351

(Lo and Yeung, 2007)

Unit 5: Data Management

Wang,X., ENGO351

SQL
SQL (Structured Query Language) is a data query
language designed for relational databases

SQL has been developed by IBM in the 1970s and many


commercial database management systems such as
Oracle, DB2, Access, and Microsoft SQL Server have since
adopted the query language

This is an English-like language that consists of a set of


powerful and flexible commands for the manipulation of the
data in the relational tables

Unit 5: Data Management

Wang,X., ENGO351

Relational algebra
The SQL language is made with a set of relational
operators and commands that can be combined to query a
database. These operators and commands form what is
called relational algebra

Unit 5: Data Management

Wang,X., ENGO351

Relational operators (1)

With SQL, six logical operators can be used:

equal to

not equal to

smaller than

greater than

smaller or equal to

greater or equal to

Unit 5: Data Management

Wang,X., ENGO351

Relational operators (2)


With SQL, three boolean
operators can be used:

NOT

AND

OR

The shaded area represents:

the complement of A (NOT)

the union of A and B (OR)

the intersection of A and B (AND)

Unit 5: Data Management

(Chang, 2006)

Wang,X., ENGO351

SQL commands (1)


SQL is used to create and query the database using a set
of SQL commands. e.g.

the create command creates databases and tables

the select command selects rows that have been inserted into the
tables

By using SQL commands, the user needs only to specify


the tables, columns, and row qualifiers to retrieve any data
item in the entire database

the users do not need to know the technical details of how the

data are stored

Unit 5: Data Management

Wang,X., ENGO351

SQL commands (2)

SQL commands are used to perform two main functions:

to define the database structure (data definition)

e.g., CREATE

to

insert, modify, and retrieve data from the database (data


manipulation)

e.g. SELECT

Unit 5: Data Management

Wang,X., ENGO351

CREATE command

To create a database:

CREATE DATABASE database_name

To create a table in a database:


CREATE TABLE table_name (column_name1 data_type,
column_name2 data_type, ....... )

CREATE TABLE ADDRESS_BOOK (


NAME CHAR (30)
COMPANY CHAR (20)
E-MAIL CHAR (25)
)
NAME
COMPANY

Unit 5: Data Management

ADDRESS_BOOK

E-MAIL

Wang,X., ENGO351

INSERT and DELETE

To insert new rows into a table:

INSERT INTO table_name VALUES (value1, value2,....)

INSERT INTO River (Name, Origin, Length) VALUES (Mississippi,


USA, 6000)

To delete rows in a table:


DELETE FROM table_name WHERE column_name =
some_value

DELETE FROM City


WHERE Country = Canada;

NAME

COMPANY

E-MAIL

John Smith

Travelcity

smith@travelcity.com

INSERT INTO ADDRESS_BOOK (NAME, COMPANY, E-MAIL)


VALUES (John Smith, Travelcity, smith@travelcity.com)
DELETE FROM ADDRESS_BOOK
WHERE COMPANY =
Travelcity;
Unit 5: Data Management

Wang,X., ENGO351

SELECT command
The select command extracts data items in specified
rows of a table. It returns a new table that has a subset of
tuples of the original.

SELECT * FROM table_name

SELECT column_name(s) FROM table_name

SELECT column_name(s) FROM table_name(s) WHERE conditions

GROUP BY column_name

Unit 5: Data Management

Wang,X., ENGO351

World database data tables

Unit 5: Data Management

Wang,X., ENGO351

SELECT Example 1.
Simplest Query has SELECT and FROM clauses
Query: List all the cities and the country they belong to.

SELECT Name, Country


FROM CITY

Result

Unit 5: Data Management

Wang,X., ENGO351

SELECT Example 2.
Commonly 3 clauses (SELECT, FROM, WHERE) are used
Query: List the names of the capital cities in the CITY table.
SELECT Name
FROM CITY
WHERE CAPITAL=Y
SELECT *
FROM CITY
WHERE CAPITAL=Y

Result

Unit 5: Data Management

Wang,X., ENGO351

SELECT Example 3
Query: List the attributes of countries in the Country
relation where the life-expectancy is less than seventy
years.
SELECT Co.Name,Co.Life-Exp
FROM Country Co
WHERE Co.Life-Exp <70
Note: use of alias Co for Table Country

Result

Unit 5: Data Management

Wang,X., ENGO351

Multi-table Query Examples


Query: List the capital cities and populations of
countries whose GDP exceeds one trillion dollars.
Note:Tables City and Country are joined by matching City.Country
= Country.Name.

SELECT Ci.Name,Co.Pop
FROM City Ci,Country Co
WHERE Ci.Country =Co.Name
AND Co.GDP >1000.0

AND Ci.Capital=Y

Unit 5: Data Management

Wang,X., ENGO351

Multi-table Query Example


Query: What is the name and population of the capital city
in the country where the St. Lawrence River originates?

SELECT Ci.Name, Ci.Pop


FROM City Ci, Country Co, River R
WHERE R.Origin =Co.Name
AND Co.Name =Ci.Country
AND R.Name =St.Lawrence
AND Ci.Capital=Y

Unit 5: Data Management

Wang,X., ENGO351

Você também pode gostar