Unit 2 Database Core Concepts and Application Structure 2.1 Introduction Objectives Self Assessment Question(s) (SAQs) 2.2 Data Model, Schemas and Instances 2.2.1 The Three-Schema Architectures 2.2.2 Data Independence Self Assessment Question(s) (SAQs) 2.3 Database Languages and Interfaces 2.3.1 DBMS Languages 2.3.1.1 DDL 2.3.1.2 Data Manipulation Languages (DMLs) 2.3.1.3 DBMS Interfaces Self Assessment Question(s) (SAQs) 2.4 DBMS Components 2.4.1 Data Manager 2.4.2 Query processor 2.4.3 DDL compiler 2.4.4 Run-time Data Base Processor 2.4.5 Pre-compiler 2.4.6 Database System Utilities Self Assessment Question(s) (SAQs) 2.5 Classification of Database Management Systems Self Assessment Question(s) (SAQs) 2.6 Summary 2.7 Terminal Questions (TQs) 2.8 Multiple Choice Questions (MCQs) 2.9 Answers to SAQs, TQs, and MCQs 2.9.1 Answers to Self Assessment Questions (SAQs) 2.9.2 Answers to Terminal Questions (TQs) 2.9.3 Answers to Multiple Choice Questions (MCQs) Database Management Systems Unit 2 Sikkim Manipal University Page No.: 15 2.1 Introduction A database model is a theory or specification describing how a database is structured and used. Several such models like Hierarchical model, Network model, Relational model etc., have been suggested. Objectives To know about Data Model, Schemas and Instances DBMS Architecture and Database Independence Database Languages and Interfaces DBMS Languages DBMS Interfaces Classification of Database Management Systems. Self Assessment Question(s) (SAQs) (for section 2.1) 1. Define Database model. List some Data Models. 2.2 Data Model, Schemas and Instances: Data Model It is a set of Concepts for viewing a set of data in a structured way. This can be easily understood by professionals and non- technical users. It can explain the way in which the organization uses and manages the information. Concepts used in a Data Model Entity An entity is something that has a distinct, separate existence, though it need not be of a material existence. E.g. - Employee. Attribute It is the property that describes an entity It is a characteristic or property of an object, such as weight, size, or color Database Management Systems Unit 2 Sikkim Manipal University Page No.: 16 Relationship Describes the relationship between two or more entities Schemas The description of the data base means defining the names, data type, size of a column in a table and database [actual data in the table] itself. The description of a database is called the database schema [or the Meta data]. Description of a database is specified during database design and is not frequently changed. Roll No. Name Semester Branch Instances The collection of data stored in the database at a particular moment is a database instance or database state or snapshot. This change very frequently due to addition, deletion and modification. Roll No. Name Semester Branch 1 Rajesh Prabhu ii E & C 2.2.1 The Three-Schema Architecture Basically Three-schema Architecture has an internal level, a Conceptual and an External level. The advantages of the three tiered architecture are that this division into levels allows both developers and users to work on their own levels. They do not need to know the details of the other levels AND they do not have to know anything about changes in the other levels. Note that each of these schemas are only descriptions of data; the data really only exists at the physical level. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 17 1. Internal level This is a description of the physical storage structure of the database Operations performed here are translated into modifications of the contents and structure of the files It has an internal schema. It describes the complete details of the stored records and access methods used to achieve efficient access to the data. 2. Conceptual level This hides the details of physical storage structures and concentrates on describing entities. This level is independent of both software and hardware. 3. External level or view level This is outermost layer This layer is closest to the users. The data viewed by the individual users is called External level. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 18 Fig. 2.1: The Three-Schema Architecture 2.2.2 Data independence: The ability to modify a schema definition in one level without affecting a schema definition in a higher level is called data independence. There are two kinds: 1. Physical data independence: This is the ability to modify the physical scheme without causing application programs to be rewritten. Modifications at this level are usually to improve performance. 2. Logical data independence: External View External View Conceptual Schema Internal Schema STORED DATABASE EXTERNAL VIEW CONCEPTUAL LEVEL INTERNAL LEVEL mappings mappings Database Management Systems Unit 2 Sikkim Manipal University Page No.: 19 This is the ability to modify the conceptual scheme without causing application programs to be rewritten. This is usually done when the logical structure of database is altered. Logical data independence is harder to achieve, as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages. Self Assessment Question(s) (SAQs) (for section 2.2) 1. Explain the concepts used in the Data Model 2. Define Schema and Instances and give one example for each. 3. Explain 3-tier Architecture with a neat diagram. 4. Define Data independence and also distinguish between Physical and Logical data independence. 2.3 Database Languages and Interfaces As a database supports a number of user groups, DBMS must have languages and interfaces that support each of these user groups. 2.3.1 DBMS Languages 2.3.1.1 DDL the data definition language, used by the DBA and database designers to define the conceptual and internal schemas. The DBMS has a DDL compiler to process DDL statements in order to identify the schema constructs, and to store the description in the catalogue. In databases where there is a separation between the conceptual and internal schemas, DDL is used to specify the conceptual schema, and SDL, storage definition language, is used to specify the internal schema. For true three-schema architecture, VDL, view definition language, is used to specify the user views and their mappings to the conceptual Database Management Systems Unit 2 Sikkim Manipal University Page No.: 20 schema. But in most DBMSs, the DDL is used to specify both the conceptual schema and the external schemas. 2.3.1.2 Data Manipulation Languages (DMLs) Data Manipulation Language (DML) is a family of computer languages used by computer programs or database users to retrieve, insert, delete and update data in a database. Currently, the most popular data manipulation language is that of SQL, which is used to retrieve and manipulate data in a Relational database. Other forms of DML are those used by IMS/DL1, CODASYL databases (such as IDMS), and others. Data manipulation languages were initially only used by computer programs, but (with the advent of SQL) have come to be used by people as well. Data manipulation languages have their functional capability organized by the initial word in a statement, which is almost always a verb. In the case of SQL, these verbs are "select", "insert", "update", and "delete". Data manipulation languages tend to have many different "flavors" and capabilities between database vendors. There has been a standard established for SQL by ANSI, but vendors still "exceed" the standard and provide their own extensions. Two main types of DML: High-level/Non procedural Can be used on its own to specify complex database operations. DMBSs allow DML statements to be entered interactively from a terminal, or to be embedded in a programming language. If the commands are embedded in a general purpose programming language, the statements must be identified, so they can be extracted by a pre-compiler and processed by the DBMS. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 21 Low Level/Procedural Must be embedded in a general purpose programming language. Typically retrieves individual records or objects from the database and processes each separately. Therefore it needs to use programming language constructs such as loops. Low-level DMLs are also called record at a time DMLS because of this. High-level DMLs, such as SQL can specify and retrieve many records in a single DML statement, and are called set at a time or set oriented DMLs. High-level languages are often called declarative, because the DML often specifies what to retrieve, rather than how to retrieve it. 2.3.1.3 DBMS Interfaces Types of interfaces provided by the DBMS include: Menu-Based Interfaces for Web Clients or Browsing: Present users with lists of options (menus) Lead user through formulation of request Query is composed of selection options frommenu displayed by system. Forms-Based Interfaces: Displays a form to each user. User can fill out a form to insert new data or fill out only certain entries. Designed and programmed for nave users as interfaces to canned transactions. Graphical User Interfaces: Displays a schema to the user in diagram form. The user can specify a query by manipulating the diagram. GUIs use both forms and menus. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 22 Natural Language Interfaces: Accept requests in written English or other languages and attempt to understand them. Interface has its own schema, and a dictionary of important words. Uses the schema and dictionary to interpret a natural language request. Interfaces for Parametric Users: Parametric users have a small set of operations they perform. Analysts and programmers design and implement a special interface for each class of nave users. Often a small set of commands are included to minimize the number of keystrokes required. (I.e. function keys) Interfaces for the DBA: Systems contain privileged commands only for DBA staff. Includes commands for creating accounts, setting parameters, authorizing accounts, changing the schema, reorganizing the storage structures etc. Self Assessment Question(s) (SAQs) (for section 2.3) 1. Define DDL. How it is processed? 2. What do you mean by DML? Explain. 3. Explain different types of DML. 4. Explain different types of DBMS interfaces and also write down interfaces for DBA Database Management Systems Unit 2 Sikkim Manipal University Page No.: 23 2.4 DBMS Components DBMS Components COMPILED [CANNED] TRANSACTIONS APPLICATI ON PROGRAM S DOL STATEMENTS PRIVILEGED COMMANDS INTERACTIVE QUERY Pre-compiler Host Language Compiler DML STATEMENTS DOL COMPILER Query Compiler DML Compiler System catalog/ data Dictionary Stored Data Manag er Concurrency Control Backup Recovery Subsystems STORED DATABASE Run time Database Processor DBA staff Casual Users Application Programmers Parametric Users Database Management Systems Unit 2 Sikkim Manipal University Page No.: 24 2.4.1 Data Manager: Most important component of a DBMS Controls access to DBMS information stored on disk It is responsible for interfacing with file system. Converts User's queries coming directly through the query processor to a physical file system. Synchronizing of the simultaneous operations performed by concurrent users is controlled by the data manager. It maintains consistency and integrity of the data It is responsible for back up and recovery, Concurrency control, security and integrity. 2.4.2 Query processor: It is used to convert online user's query into an efficient series of operations and send it to the data manager for execution It uses data dictionary to find the structure of schema object [tables, index, and stored procedures]. 2.4.3 DDL compiler: It processes schema definitions specified in the DDL and stores descriptions of the schema [Metadata] in the DBMS catalog. The catalogue includes information such as names of the files, data items, storage details of each file and constraints. 2.4.4 Run-time data base processor: It handles database access at run time. It receives retrieval or update operations and carries them out on the database. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 25 2.4.5 Pre-compiler: DML commands from an application program written in a host programming language are extracted from a Pre-Compiler. These commands are sent to the DML compiler for compilation into object code for database access. The rest of the program is sent to the host language compiler. 2.4.6 Database System Utilities DBA in managing the database system. UTILITY FUNCTION Load Loads data from data file into the database Backup Creates a backup copy of the database File reorganization Helps to reorganize a database file into a different file organization. Performance monitoring Helps to monitor database usage by providing statistics to the DBA Self Assessment Question(s) (SAQs) (for section 2.4) 1. Explain the functionality of Database Manager 2. Write a note on Query processor, DDL Compiler, Run-time database processor, pre-compiler 2.5 Classification of Database Management Systems Several criteria are normally used to classify DBMS. A. Based on Data model B. Based on number of users C. Based on number of sites. A. Based on Data model: It specifies a particular mechanism for data storage and retrieval. The primary difference between the different database models lies in the methods of expressing relationships and constraints among the data elements. Five database models are discussed here: Database Management Systems Unit 2 Sikkim Manipal University Page No.: 26 1. Hierarchical Model: It is one of the oldest database models [1950s], and represents data as hierarchical tree structures. 2. Network Model: It represents data as record types, and has an ability to handle many-to-many relationships. 3. Relational Model: Relational models stores data in the form of a table. Data is interrelated; relationships link rows from two tables. End-users need not know about physical data storage details. So it is conceptually simple. A relational database is data driven, not design driven. It is designed once, and the data changes over time without affecting the applications. Data is stored once, so maintaining consistency among all applications is easier. The presence of powerful query language (SQL) is one of the main reasons for the immense popularity of the relational database model; it allows the user to specify what must be done, without specifying how it must be done. A Relational model is based on mathematical theory, whose principles were laid down by Dr. E. F. Codd. The relational model uses a collection of tables to represent both data and the relationships among those data. 'Codd' provides a set of 12 rules, which qualify a database product as relational. 4. Object Oriented model: It is based on a collection of objects. Object oriented database manages objects, and is suited for multimedia applications as well as data with complex relationships that are difficult to model and process in a relational DBMS. Object Oriented Data Base Management System [OODBMS] holds data, text, pictures, voice and video. OODBMS contains 'values stored in instance variables and methods or functions which controls the behavior of the variables. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 27 OOP concepts such as inheritance, polymorphism and dynamic binding features improve productivity. 5. Object-relational model: It is a combination of both objects oriented concepts and relational concepts. It combines the advantages of modern object oriented programming languages, which provide facility for the users to define new data type and functions of their own. Relational database is not only useful for storing data, but also provides business rules that are applied to the data. Associating rules with data makes the data more active, enabling the database system to perform automatic validity checks to automate many business procedures. It supports specialized applications such as image retrieval, searching, multimedia, etc. Eg: 1. IBM's DB2 universal server, oracle 8 and SQL server 7 and so on. B. Based on number of users: It is based on number of users supported by the system. Single user system supports only one user at a time and multi-user system supports multiple users concurrently. C. Based on number of sites: A DBMS is centralized if the data is stored at a single computer site. A DBMS is distributed if the data and DBMS software are distributed over many sites, connected by a computer network. Self Assessment Question(s) (SAQs) (for section 2.5) 1. What are the criteria used for classification of DBMS? 2.6 Summary In this chapter we have studied about the basic concepts of DBMS like Data Model, Schemas and Instances, The Three-Schema Architectures, Data independence, DBMS Interfaces, DBMS Components, Data Manager, Query processor, DDL compiler, Run-time data base processor, and Database Management Systems Unit 2 Sikkim Manipal University Page No.: 28 Pre-compiler. We have also studied about Database System Utilities. We have also studied about various classifications of database like Hierarchical Model, Network Model, Relational Model Object and Oriented model. 2.7 Terminal Questions (TQs) 1. Define data model and discuss the categories of data models. (2.2) 2. Write a note on schemas. (2.2) 3. Write a note on data Independence. (2.3.2) 4. What is the difference between logical data independence and physical data independence? (2.3.2) 2.8 Multiple Choice Questions (MCQs) 1. -------- is used to convert online user's query into an efficient series of operations and send it to the data manager for execution A. Query processor B. DDL compiler C. Run-time data base processor D. Pre-compiler 2. -------- processes schema definitions specified in the DDL and stores descriptions of the schema [Metadata] in the DBMS catalog A. Query processor B. DDL compiler C. Run-time data base processor D. Pre-compiler 3. --------- handles database access at run time A. Query processor B. DDL compiler C. Run-time data base processor D. Pre-compiler Database Management Systems Unit 2 Sikkim Manipal University Page No.: 29 4. DML commands from an application program written in a host programming language is extracted from a A. Query processor B. DDL compiler C. Run-time data base processor D. Pre-compiler 2.9 Answers to SAQs, TQs, and MCQs 2.9.1 Answers to Self Assessment Questions (SAQs) For section 2.1 1. A database model is a theory or specification describing how a database is structured and used. There are several such models like Hierarchical model, Network model, Relational model etc. (Refer section 2.1) For section 2.2 1. Entity An entity is something that has a distinct, separate existence, though it need not be of a material existence. E.g. - Employee. Attribute It is the property that describes an entity It is a characteristic or property of an object, such as weight, size, or color Relationship Describes the relationship between two or more entities Database Management Systems Unit 2 Sikkim Manipal University Page No.: 30 2. Schemas The description of the data base means defining the names, data type, size of a column in a table and database [actual data in the table] itself. The description of a database is called the database schema [or the Meta data]. Description of a database is specified during database design and is not frequently changed. Roll No. Name Semester Branch Instances The collection of data stored in the database at a particular moment is a database instance or database state or snapshot. The changes very frequently due addition, deletion and modification. Roll No. Name Semester Branch 1 Rajesh Prabhu ii E & C (Refer section 2.2) 3. Basically Three-schema Architecture has an internal level, a Conceptual and an External level. The advantages of the three tiered architecture are that this division into levels allows both developers and users to work on their own levels. They do not need to know the details of the other levels AND they do not have to know anything about changes in the other levels. Note that each of these schemas are only descriptions of data; the data really only exists at the physical level. (Refer to section 2.2.1) Database Management Systems Unit 2 Sikkim Manipal University Page No.: 31 4. The ability to modify a schema definition in one level without affecting a schema definition in a higher level is called data independence (Refer to sections 2.2.2, 2.2.2.1 and 2.2.2.2) For section 2.3 1. DDL the data definition language, used by the DBA and database designers to define the conceptual and internal schemas. (Refer section 2.3.1.1) 2. Data Manipulation Language (DML) is a family of computer languages used by computer programs or database users to retrieve, insert, delete and update data in a database. (Refer section2.3.1.2) 3. Two main types of DML - High-level/Non procedural (Refer section 2.3.1.2) 4. Types of interfaces provided by the DBMS include: Menu-Based Interfaces for Web Clients or Browsing Present users with list of options (menus) Lead user through formulation of request Query is composed of selection options from menu displayed by system. Forms-Based Interfaces Graphical User Interfaces Natural Language Interfaces Interfaces for Parametric Users Interfaces for the DBA (Refer section 2.3.1.3) Database Management Systems Unit 2 Sikkim Manipal University Page No.: 32 For Section 2.4 1. Data Manager Most important component of a DBMS Controls access to DBMS information stored on disk (Refer section 2.4.1) 2. Query processor: It is used to convert online user's query into an efficient series of operations and send it to the data manager for execution DDL compiler: It processes schema definitions specified in the DDL and stores descriptions of the schema [Metadata] in the DBMS catalog. Run-time data base processor: It handles database access at run time. Pre-compiler: DML commands from an application program written in a host programming language is extracted from a Pre-Compiler. (Refer sections 2.4.2, 2.4.3, 2.4.4, 2.4.5) For section 2. 5 1. Classification of Database Management Systems Several criteria are normally used to classify DBMS. A. Based on Data model B. Based on number of users C. Based on number of sites. (Refer Section 2.5) 2.9.2 Answers to Terminal Questions (TQs) 1. It is a set of Concepts for viewing a set of data in a structured way. This can be easily understood by professionals and non-technical users. Database Management Systems Unit 2 Sikkim Manipal University Page No.: 33 It can explain the way in which the organization uses and manages the information. (Refer section 2.2) 2. The description of the data base means defining the names, data type, size of a column in a table and database [actual data in the table] itself. (Refer section 2.2) 3. The ability to modify a schema definition in one level without affecting a schema definition in a higher level is called data independence. (Refer section 2.2.2) 4. Physical data independence: The ability to modify the physical scheme without causing application programs to be rewritten. Modifications at this level are usually to improve performance (Refer section 2.2.2) 2.9.3 Answers to Multiple Choice Questions (MCQs) 1. A 2. B 3. C 4. D