Escolar Documentos
Profissional Documentos
Cultura Documentos
Disclaimer: These slides are just for the purpose of easy reading and are not comprehensive in nature. Thus the slides have to be read together with the class lectures, reading material, and statutes dealing with the subject
RDBMS/Slide 1
Introduction
Flat files - 1960s - 1980s Hierarchical 1970s - 1990s Network 1970s - 1990s Relational 1980s present Object-oriented 1990s present Object-relational 1990s present Data warehousing 1980s present Web-enabled 1990s - present
RDBMS/Slide 2
Introduction
File-based Systems: Entities or objects of interest are represented by records that are stored together in files. Relationships between objects are represented by using directories of various kinds.
RDBMS/Slide 3
Introduction
RDBMS/Slide 4
Introduction
Common characteristics required for data models: A data model must show some degree of conceptual simplicity without compromising the semantic completeness. A data model must represent the real world as closely as possible. The representation of the real-world transformations (behavior) must be in compliance with the consistency and integrity characteristics of any data model.
RDBMS/Slide 5
Introduction
An organization must have accurate and reliable data for effective decision making. To this end organization maintains records on the various facets of its operations. These models capture the essential properties of the objects and record relationships among them. Such related data is called a database.
A database system is an integrated collection of related files, along with details of the interpretation of the data contained therein.
A database management system is a software system that allows access to data contained in a database.
RDBMS/Slide 6
Introduction
Three Level Architecture The architecture is divided into three levels: the external level, the conceptual level and the internal level The view at each of these levels is described by a scheme. A scheme is an outline or a plan that describes the records and relationships existing in the view. The word scheme, which means a systematic plan for attaining some goal is used interchangeably in the database literature with the word schema.
RDBMS/Slide 7
External Level
VIEW A
VIEW B
VIEW C
Conceptual Level
CONCEPTUAL VIEW
Internal Level
INTERNAL VIEW
RDBMS/Slide 8
External view is at the highest level of abstraction where only those portions of the database of concern to a user or application program are included.
Schema consists of the definition of the logical records and the relationships in the external view.
RDBMS/Slide 9
Schema describes all the records and relationships included in the conceptual view
RDBMS/Slide 10
RDBMS/Slide 11
Data Independence
Data Independence Three levels of abstraction, along with the mappings from internal to conceptual and from conceptual to external, provide two distinct levels of data independence: Logical Data Independence Physical Data Independence
RDBMS/Slide 12
Data Independence
Logical Data Independence: It indicates that the conceptual schema can be changed without affecting the existing external schemas. The change would be absorbed by the mapping between the external and conceptual levels. Logical data independence also insulates application programs from operations such as combining two records into one or splitting an existing record into two or more records. Is achieved by providing the external level or user view of the database.
RDBMS/Slide 13
Data Independence
Physical Data Independence It indicates that the physical storage structures or devices used for storing the data could be changed without necessitating a change in the conceptual view or any of the external views. The changed would be absorbed by the mapping between the conceptual and internal levels. It is achieved by the presence of the internal level of the database and the mapping or transformation from the conceptual level of the database to the internal level.
RDBMS/Slide 14
Database Users
DBMS Users The users of the database system can be classified in the following groups depending on their degree of expertise or mode of their interactions with the DBMS Nave Users Online Users Application Programmers Database Administrator
RDBMS/Slide 15
Database Users
Nave User User who need not be aware of the presence of the database system or any other system supporting their usage are considered nave users. User of the ATM falls in this category
RDBMS/Slide 16
Database Users
Online Users These are users who may communicate with the database directly via online terminal or indirectly via a user interface and application program. These users are aware of the presence of the database system and may have acquired a certain amount of expertise in the limited interaction they are permitted with the database through the intermediary of the application program. Online users can also be nave users requiring additional help, such as menus.
RDBMS/Slide 17
Database Users
Application Programmers Professional programmers who are responsible for developing application program or user interfaces utilized by the nave and online users fall into this category. The application programs could be written in a general purpose programming language such as C or VB and include the commands required to manipulate the database.
RDBMS/Slide 18
Database Users
Database Administrator Centralized control of the database is exerted by a person under the supervision of a high level administrator. This person or group is referred to as the database administrator (DBA). The DBA administers the three levels of the database and in consultation with the overall community sets up the definition of the global view or conceptual level of the database. DBA specifies the external view of the various users and applications and is responsible for the definition and implementation of the internal level including the storage structure and access methods to be used for the optimum performance of the DBMS.
RDBMS/Slide 19
Database Users
Database Administrator (Contd.,) DBA is responsible for granting permission to the users of the database and stores the profile of each user in the database. DBA is also responsible for defining the procedures to recover the database from failures due to human, natural, or hardware causes with minimal loss of data.
RDBMS/Slide 20
10
Data Models
Data Models
Can be classified into two categories: Object-based logical model focuses on describing the data, the relationship among the data, and any constraints defined Record-based logical model focuses on describing the data structure and the access techniques in the Database Management System
RDBMS/Slide 21
Data Models
Object-based Logical Model
There are various object-based models. The most widely used is the entityrelationship model (E/R model)
RDBMS/Slide 22
11
Data Models
The Entity-relationship Model
Introduced by Peter Chen Chen introduced not only the E/R model, but also a corresponding diagramming technique
RDBMS/Slide 23
RDBMS/Slide 24
12
RDBMS/Slide 25
RDBMS/Slide 26
13
RDBMS/Slide 27
Supplier
Ships
Parts
Entities Relationship
RDBMS/Slide 28
14
Types of Relationships
Types of Relationships
There are three types of relationships: One-to-One One-to-Many (or Many-to-One) Many-to-Many
RDBMS/Slide 29
Types of Relationships
One-to-One Relationship
Consider the example of a university. For one DEPARTMENT (like the department of social sciences) there can be only one department head. This is a one-to-one relationship.
RDBMS/Slide 30
15
Types of Relationships
Many-to-One Relationship
A STUDENT can MAJOR in only one course, but many STUDENTs would have registered for a given MAJOR course. This is a many-to-one relationship.
RDBMS/Slide 31
Types of Relationships
Many-to-Many Relationship
An employee might learn many job skills, and each job skill might be learned by many employees.
EMPLOYEE
LEARNS
SKILL
RDBMS/Slide 32
16
Types of Relationships
Just a Minute
1. What do the following E/R diagrams represent?
RDBMS/Slide 33
Types of Relationships
RDBMS/Slide 34
17
Types of Relationships
RDBMS/Slide 35
RDBMS/Slide 36
18
RDBMS/Slide 37
RDBMS/Slide 38
19
RDBMS/Slide 39
RDBMS/Slide 40
20
RDBMS/Slide 41
A Hierarchical Structure
RDBMS/Slide 42
21
RDBMS/Slide 43
RDBMS/Slide 44
22
RDBMS/Slide 45
RDBMS/Slide 46
23
RDBMS/Slide 47
RDBMS/Slide 48
24
RDBMS/Slide 49
RDBMS/Slide 50
25
RDBMS/Slide 51
RDBMS/Slide 52
26
Relation P
RDBMS/Slide 53
Next
RDBMS/Slide 54
27
ID ID 101 103 104 106 107 NAME Jones Smith James Byron Evan Software Packages J1 J2 Relation Q Cartesian Product 101 101 103 103 104 104 106 106 107 107
NAME Jones Jones Smith Smith James James Byron Byron Evan Evan
S J1 J2 J1 J2 J1 J2 J1 J2 J1 J2
Relation P
Relation P x Q
Back
RDBMS/Slide 55
Relation P
Relation P U Q
Back
RDBMS/Slide 56
28
NAME Smith James Byron Drew Intersection ID 103 104 110 NAME Smith James Drew
Relation Q
Relation P Q
Relation P
Back
RDBMS/Slide 57
NAME Smith James Byron Drew Difference ID 101 107 112 NAME Jones Evan Smith
Relation P - Q Relation Q
Relation P
Back
RDBMS/Slide 58
29
RDBMS/Slide 59
RDBMS/Slide 60
30
Codd Rules
When is a DBMS Relational? - Codd's Rules Dr. Edgar Frank Codd an IBM researcher, first developed the relational model in 1970. In 1985 Dr. Codd published a list of 12 rules that concisely define an ideal relational database, which have provided a guideline for the design of all relational database systems.
RDBMS/Slide 61
Codd Rules
Codd's Rules can be divided into 5 functional areas Foundation Rules Structural Rules Integrity Rules Data Manipulation Rules Data Independence Rules
RDBMS/Slide 62
31
Codd Rules
Foundation Rules (Rules 0 & 12): Rule 0: Any system claimed to be a RDBMS must be able to manage databases entirely through its relational capabilities. All data definition & manipulation must be able to be done through relational operations.
RDBMS/Slide 63
Codd Rules
Rule 12 - No subversion Rule: If a RDBMS has a low level (record at a time) language, that low level language cannot be used to subvert or bypass the integrity rules &constraints expressed in the higher-level relational language. All database access must be controlled through the DBMS so that the integrity of the database cannot be compromised without the knowledge of the user or the DBA. This does not prohibit use of record at a time languages e.g. PL/SQL
RDBMS/Slide 64
32
Codd Rules
Structural Rules (Rules 1 & 6): The fundamental structural construct is the table. Codd states that an RDBMS must support tables, domains, primary & foreign keys. Each table should have a primary key.
RDBMS/Slide 65
Codd Rules
Rule 1 - Information Rule: All info in a RDB is represented explicitly at the logical level in exactly one way - by values in a table. ALL info even the Metadata held in the system catalogue MUST be stored as relations (tables) & manipulated in the same way as data.
RDBMS/Slide 66
33
Codd Rules
Rule 6 - View Updating: All views that are theoretically updatable are updatable by the system. Not really implemented yet by any available system.
RDBMS/Slide 67
Codd Rules
Integrity Rules (Rules 3 & 10): Integrity should be maintained by the DBMS not the application. Rule 3 - Systematic treatment of null values: Null values are supported for representation of 'missing' & inapplicable information in a systematic way & independent of data type.
RDBMS/Slide 68
34
Codd Rules
Rule 10 - Integrity independence: Integrity constraints specific to a particular RDB must be definable in the relational data sublanguage & storable in the DB, NOT the application program. This gives the advantage of centralised control & enforcement
RDBMS/Slide 69
Codd Rules
Data Manipulation Rules (Rule 2, 4, 5 & 7): User should be able to manipulate the 'Logical View' of the data with no need for knowledge of how it is Physically stored or accessed. Rule 2 - Guaranteed Access: Each & every datum in an RDB is guaranteed to be logically accessible by a combination of table name, primary key value & column name.
RDBMS/Slide 70
35
Codd Rules
Rule 4 - Dynamic on-line Catalog based on relational model: The DB description (metadata) is represented at logical level in the same way as ordinary data, so that same relational language can be used to interrogate the metadata as regular data. System & other data stored & manipulated in the same way.
RDBMS/Slide 71
Codd Rules
Rule 5 - Comprehensive Data Sublanguage: RDBMS may support many languages & modes of use, but there must be at least ONE language whose statements can express ALL of the following > Data Definition > View Definition > Data manipulation (interactive & via program) > Integrity constraints > Authorization > Transaction boundaries (begin, commit & rollback) > 1992 - ISO standard for SQL provides all these functions
RDBMS/Slide 72
36
Codd Rules
Rule 7 - High-level insert, update & delete: Capability of handling a base table or view as a single operand applies not only to data retrieval but also to insert, update & delete operations.
RDBMS/Slide 73
Codd Rules
Data Independence Rules (Rules 8, 9 11): These rules protect users & application developers from having to change the applications following any low-level reorganisation of the DB.
RDBMS/Slide 74
37
Codd Rules
Rule 8 - Physical Data Independence: Application Programs & Terminal Activities remain logically unimpaired whenever any changes are made either to the storage organisation or access methods.
RDBMS/Slide 75
Codd Rules
Rule 9 - Logical Data Independence: Application Programmes & Terminal Acts remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
RDBMS/Slide 76
38
Codd Rules
Rule 11 - Distribution Independence: The data manipulation sublanguage of an RDBMS must enable application programs & queries to remain logically unchanged whether & whenever data is physically centralised or distributed. This means that an Application Program that accesses the DBMS on a single computer should also work ,without modification, even if the data is moved from one computer to another in a network environment. The user should 'see' one centralised DB whether data is located on one or more computers. This rule does not say that to be fully Relational the DBMS must support distributed DB's but that if it does the query must remain the same.
RDBMS/Slide 77
Codd Rules
Summary of Codd Rules: Rule 1: The Information Rule: All information in a relational database is represented explicitly at the logical level in exactly one wayby values in tables Rule 2: Guaranteed Access Rule: All data should be accessible without ambiguity. This can be accomplished through a combination of the table name, primary key, and column name. Rule 3: Systematic Treatment of Null Values: A field should be allowed to remain empty. This involves the support of a null value, which is distinct from an empty string or a number with a value of zero. Of course, this can't apply to primary keys. In addition, most database implementations support the concept of a non- null field constraint that prevents null values in a specific table column. Rule 4: Dynamic On-Line Catalog Based on the Relational Model: A relational database must provide access to its structure through the same tools that are used to access the data. This is usually accomplished by storing the structure definition within special system tables.
RDBMS/Slide 78
39
Codd Rules
Summary of Codd Rules (Contd.,): Rule 5: Comprehensive Data Sublanguage Rule: The database must support at least one clearly defined language that includes functionality for data definition, data manipulation, data integrity, and database transaction control. All commercial relational databases use forms of the standard SQL (Structured Query Language) as their supported comprehensive language. Rule 6: View Updating Rule: Data can be presented to the user in different logical combinations, called views. Each view should support the same full range of data manipulation that direct-access to a table has available. In practice, providing update and delete access to logical views is difficult and is not fully supported by any current database. Rule 7: High-level Insert, Update, and Delete: Data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.
RDBMS/Slide 79
Codd Rules
Summary of Codd Rules (Contd.,): Rule 8: Physical Data Independence: The user is isolated from the physical method of storing and retrieving information from the database. Changes can be made to the underlying architecture ( hardware, disk storage methods ) without affecting how the user accesses it. Rule 9: Logical Data Independence: How a user views data should not change when the logical structure (tables structure) of the database changes. This rule is particularly difficult to satisfy. Most databases rely on strong ties between the user view of the data and the actual structure of the underlying tables. Rule 10: Integrity Independence: The database language (like SQL) should support constraints on user input that maintain database integrity. This rule is not fully implemented by most major vendors. At a minimum, all databases do preserve two constraints through SQL. No component of a primary key can have a null value. (see rule 3) If a foreign key is defined in one table, any value in it must exist as a primary key in another table.
RDBMS/Slide 80
40
Codd Rules
Summary of Codd Rules (Contd.,): Rule 11: Distribution Independence: A user should be totally unaware of whether or not the database is distributed (whether parts of the database exist in multiple locations). Rule 12: No subversion Rule: There should be no way to modify the database structure other than through the multiple row database language (like SQL). Most databases today support administrative tools that allow some direct manipulation of the data structure. Note: Rule 6, 9, 10, 11 and 12 are difficult to satisfy.
RDBMS/Slide 81
Summary
In this lesson, you learned that: Data models can be classified as: Object-based models Record-based models In the entity-relationship diagramming technique: Entities are represented as rectangles Relationships are represented as diamonds Attributes are represented as ellipses Relationships, whether many-to-many, one-to-many, or one-to-one, are represented symbolically
RDBMS/Slide 82
41
Summary
(Contd.,) Weak entities are represented in double-lined boxes Subtypes are connected to the supertype by an unnamed relationship, marked with a crossbar on top In the relational model, data is represented in tables (relations) of rows (tuples) and columns (attributes) The number of tuples is called the cardinality of the relation, and the number of attributes is called the degree of the relation An attribute (or set of attributes) that is unique in every tuple is called the primary key
RDBMS/Slide 83
Summary
(Contd.,) Unknown or missing information is represented by a NULL in a table The foreign key is a column in one table that matches the primary key of another table The relational model is based on the principle of relational algebra The eight operators that operate on relations are restrict, project, product, union, intersect, difference, join, and divide
RDBMS/Slide 84
42
Relational Model
Keys are special fields that serve two main purposes: Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship) Keys can be simple (a single field) or composite (more than one field)
RDBMS/Slide 85
Relational Model
Combined, these are a composite primary key (uniquely identifies the order line)individually they are foreign keys (implement M:N relationship between order and product)
RDBMS/Slide 86
43
Relational Model
A null value for an attribute is a value that is either not known at the time or doest not apply to a given instance of the object. It may also be possible that a particular tuple does not have a value for an attribute.
RDBMS/Slide 87
Relational Model
Integrity Rule 1 (Contd.,): If any attribute of primary key were permitted to have null value, the key cannot be used for identification of tuples. This contradicts the requirements for a primary key. If null values were permitted then the two tuples <@, Smith> are indistinguishable even though they may represent two different instances of entity type employee. Integrity Rule 1 specifies that instances of the entities are distinguishable and thus no prime attribute value may be null. This rule is also referred to as the Entity Rule.
RDBMS/Slide 88
44
Relational Model
Integrity Rule 2: Integrity rule 2 is concerned with foreign keys ie., with attributes of a relation having domains that are those of the primary key of another relation. A relation (R) may contain references to another relation (S). Relation R and S need not be distinct. Suppose the reference in R is via a set of attributes that forms a primary key of the relation S. This set of attributes in R is a foreign key. The referencing attribute(s) in the relation R can have null value(s); in this case, it is not referencing any tuple in the relation S. If the value is not null, it must exist as the primary attribute of a tuple of the relation S.
Integrity Rule 2 specifies that, Given two relations R and S, suppose R refers to the relation S via a set of attributes that forms the primary key of S and this set of attributes forms a foreign key in R. Then the value of the foreign key in a tuple in R must either be equal to the primary key of a tuple of S or be entirely null. This rule is also referred to as the Referential Integrity Rule.
RDBMS/Slide 89
Relational Model
Relational Calculus: Tuple and Domain calculus are collectively referred to as relational calculus. Relational Calculus is a query system wherein queries are expressed as variables and formulas on these variables. Such formulas the properties of the required result without specifying the method of evaluating it. Relational calculus which means calculating with relations, is based on predicate calculus which is calculating with predicates. Predicate calculus is a formal language used to symbolize logical arguments in mathematics. Formal logic the main subject matter is propositions from which we can build other propositions. In predicate calculus, propositions may be built not only out of other propositions but also out of elements that are not themselves propositions. Propositions specifying a property consist of an expression that means an individual object and another expression called the predicate, that stands for the property that the individual object possesses.
RDBMS/Slide 90
45