Você está na página 1de 10

LOVELY PROFESSIONAL

UNIVERSITY

Data warehousing and data mining


(DE551)

Home work 3

SUBMITTED TO- SUBMITTED BY-


MS.BABITA PANDEY RAMANDEEP SINGH
RA17T3
7450070085
Set A

Q1. What is snowflake schema? Give an example.


ANS:

A snowflake schema is a logical arrangement of tables in a multidimensional


database such that the entity relationship diagram resembles a snowflake in shape.
The snowflake schema is represented by centralized fact tables which are connected
to multiple dimensions.

The snowflake schema is similar to the star schema. However, in the snowflake
schema, dimensions are normalized into multiple related tables, whereas the star
schema's dimensions are de normalized with each dimension represented by a
single table.

Example of a Snowflake schema

The following example query is the snowflake schema equivalent of the star schema
example code which returns the total number of units sold by brand and by country
for 1997 The benefit of using the snowflake schema in this example is that the
storage requirements are lower since the snowflake schema eliminates many
duplicate values from the dimensions themselves.
Q2. Suppose that a data warehouse consists of the three dimensions time, doctor, and patient,
and the two measures count and charge, where charge is the fee that a doctor charges a
patient for a visit.
(a) Enumerate three classes of schemas that are popularly used for modelling data
warehouses.
(b) Draw a schema diagram for the above data warehouse using one of the schema classes
listed in (a).
(a) The three classes of schemas that are popularly used for modelling data
warehouses
 STAR SCHEMA: a fact table in the middle connected to a set
of dimension tables
 SNOWFLAKE SCHEMA: a refinement of star schema where
some dimensional hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to snowflake.

 FACT CONSTELLATIONS: multiple fact tables share


dimension tables, viewed as a collection of stars, therefore
called galaxy schema or fact constellation.

(b) As figures below gives a schema diagram-

Q3. A data warehouse can be modelled by either a star schema or a snowflake schema.
Briefly describe the similarities and the differences of the two models, and then analyze their
advantages and disadvantages with regard to one another. Give your opinion of which might
be more empirically useful and state the reasons behind your answer.

Star Schema:
Definition: The star schema is the simplest data warehouse schema. It is
called a star schema because the diagram resembles a star with points
radiating from a center.

A single Fact table (center of the star) surrounded by multiple dimensional


tables(the points of the star).

Star Schema : Star Schema is a relational database schema for


representing multidimensional data. It is the simplest form of data
warehouse schema that contains one or more dimensions and fact tables.
It is called a star schema because the entity-relationship diagram
between dimensions and fact tables resembles a star where one fact table
is connected to multiple dimensions. The center of the star schema
consists of a large fact table and it points towards the dimension tables.
The advantage of star schema are slicing down performance increase and
easy understanding of data.

In a star schema every dimension will have a primary key.

 In a star schema a dimension table will not have any parent table.
 Whereas in a snow flake schema a dimension table will have one or more
parent tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in star
schema.
 Whereas hierarchies are broken into separate tables in snow flake schema.
These hierarchies helps to drill down the data from topmost hierarchies to the
lowermost hierarchies.

star schema: it is a highly de-normalized technique... in this one fact table is


associated with n number of dimensions table... .. it looks like a star..
snow Flake Schema: If We apply normalized Principles to Star Schema Then It is
Known As Snow Flake Schema.. In This Each Dimension Table Associated With Sub
Dimension Table..

Star schema : In this star schema fact table in normalized format and
dimension table is in de normalized format. It also known as basic star
schema.

Snowflake schema:

Definition: A Snowflake schema is a Data warehouse Schema which


consists of a single Fact table and multiple dimensional tables. These
Dimensional tables are normalized .A variant of the star schema where
each dimension can have its own dimensions.

Snowflake Schema : A snowflake schema is a term that describes a star


schema structure normalized through the use of outrigger tables. i.e
dimension table hierarchies are broken into simpler tables.

Snow flake schema: In this both dimension and fact table is in normalized
format only. It is also known as Extended star schema.

If u r taking the snow flake it requires more dimensions more foreign keys
and it will reduce the query performance but it normalizes the records.

depends on the requirement we can choose the schema

However in Snowflake schema the dimension tables are further


normalized into different tables (fact table is single in this schema
also).This schema stores data in a more normalized form .

Advantages of Star Schema:

• Simplest DW schema
• Easy to understand
• Easy to Navigate between the tables due to less number of joins.
• Most suitable for Query processing
Disadvantages:
• Occupies more space
• Highly Denormalized

Advantages of Snowflake schema:

• These tables are easier to maintain


Saves the storage space.
Disadvantages of Snowflake schema:

• Due to large number of joins it is complex to navigate

Starflake schema - Hybrid structure that contains a mixture of (denormalized)


STAR and (normalized) SNOWFLAKE schemas.

It depends on scenario as how much data is generally there in the


dataware house generally star schema is preferred.

Set 2
Q1. What is OLAP?
ANS:
OLAP stands for On Line Analytical Processing, a series of protocols used mainly for
business reporting. Using OLAP, businesses can analyze data in all manner of
different ways, including budgeting, planning, simulation, data warehouse reporting,
and trend analysis. A main component of OLAP is its ability to make
multidimensional calculations, allowing a wide and lightning-fast array of possibilities.
In addition, the bigger the business, the bigger its business reporting needs.
Multidimensional calculations enable a large business to complete in seconds what it
otherwise would have waited a handful of minutes to receive.
OLAP is an approach to swiftly answer multi-dimensional analytical queries. OLAP
is part of the broader category of business intelligence, which also
encompasses relational reporting and data mining. Typical applications of OLAP
include business reporting for sales, marketing, management reporting, business
process management(BPM). budgeting and forecasting, financial reporting and
similar areas, with new applications coming up, such as agriculture . The term
OLAP was created as a slight modification of the traditional database
term OLTP (Online Transaction Processing)

Databases configured for OLAP use a multidimensional data model, allowing for
complex analytical and ad-hoc queries with a rapid execution time. They borrow
aspects of navigational databases and hierarchical databases that are faster
than relational databases.

The output of an OLAP query is typically displayed in a matrix (or pivot) format. The
dimensions form the rows and columns of the matrix; the measures form the values.

For example:

Sales Fact Table


+-------------+----------+
| sale_amount | time_id |
+-------------+----------+ Time Dimension
| 2008.10| 1234 |---+ +---------
+-------------------+
+-------------+----------+ | | time_id | timestamp
|
| +---------
+-------------------+
+---->| 1234 | 20080902
12:35:43 |
+---------
+-------------------+

TYPES OF OLAP
MOLAP

MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP.
MOLAP stores this data in an optimized multi-dimensional array storage, rather than
in a relational database. Therefore it requires the pre-computation and storage of
information in the cube - the operation known as processing.
Relational
ROLAP works directly with relational databases. The base data and the dimension
tables are stored as relational tables and new tables are created to hold the
aggregated information. Depends on a specialized schema design. This
methodology relies on manipulating the data stored in the relational database to give
the appearance of traditional OLAP's slicing and dicing functionality. In essence,
each action of slicing and dicing is equivalent to adding a "WHERE" clause in the
SQL statement.
Hybrid
There is no clear agreement across the industry as to what constitutes "Hybrid
OLAP", except that a database will divide data between relational and specialized
storage. For example, for some vendors, a HOLAP database will use relational
tables to hold the larger quantities of detailed data, and use specialized storage for
at least some aspects of the smaller quantities of more-aggregate or less-detailed
data.

Q2.How meta data is different from data warehouse?


ANS:
The term Metadata is an ambiguous term which is used for two fundamentally
different concepts (Types). Although an expression "data about data" is often used, it
does not apply to both in the same way. Structural metadata, the design and
specification of data structures, cannot be about data, because at design time the
application contains no data. In this case the correct description would be "data
about the containers of data". Descriptive metadata on the other hand, is about
individual instances of application data, the data content.
Metadata (meta content) is traditionally found in the card catalogues of libraries. By
describing the contents and context of data files, the quality of the original data/files
is greatly increased. For example, a webpage may include metadata specifying what
language it's written in, what tools were used to create it, and where to go for more
on the subject, allowing browsers to automatically improve the experience of users.
Data warehouse (DW) is a repository of an organization's electronically stored data.
Data warehouses are designed to manage and store the data whereas the Business
Intelligence (BI) focuses on the usage of data to facilitate reporting and analysis.

The purpose of a data warehouse is to house standardized, structured, consistent,


integrated, correct, cleansed and timely data, extracted from various operational
systems in an organization. The extracted data is integrated in the data
warehouse environment in order to provide an enterprise wide perspective, one
version of the truth. Data is structured in a way to specifically address the reporting
and analytic requirements.

An essential component of a data warehouse/business intelligence system is the


metadata and tools to manage and retrieve metadata. Ralph Kimball describes
metadata as the DNA of the data warehouse as metadata defines the elements of
the data warehouse and how they work together.

Q3. What are meta data? Why are metadata so important to a data warehouse?

Metadata is data about data. Metadata has been around as long as there have been
programs and data that the programs operate on. Figure shows metadata in a simple
form.

While metadata is not new, the role of metadata and its importance in the face of the
data warehouse certainly is new. For years the information technology professional has
worked in the same environment as metadata, but in many ways has paid little attention
to metadata. The information professional has spent a life dedicated to process and
functional analysis, user requirements, maintenance, architectures, and the like.

Metadata plays a very different role in data warehouse. Relegating metadata to a


backwater, passive role in the data warehouse environment is to defeat the purpose of
data warehouse. Metadata plays a very active and important part in the data warehouse
environment.
The reason why metadata plays such an important and active role in the data warehouse
environment is apparent when contrasting the operational environment to the data
warehouse environment insofar as the user community is concerned.
The information technology professional is the primary community involved in the usage
of operational development and maintenance facilities. It is expected that the
information technology community is computer literate, and able to find his/her way
around systems. The community served by the data warehouse is a very different
community. The data warehouse serves the DSS analysis community. It is anticipated
that the DSS analysis community is not computer literate. Instead the expectation is
that the DSS analysis community is a businessperson community first, and a technology
community second.
Simply from the standpoint of who needs help the most in terms of finding one's way
around data and systems, it is assumed the DSS analysis community requires a much
more formal and intensive level of support than the information technology community.
For this reason alone, the formal establishment of and ongoing support of metadata
becomes important in the data warehouse environment.
But there is a secondary, yet important, reason why metadata plays an important role in
the data warehouse environment. In the data warehouse environment, the first thing the
DSS analyst needs to know in order to do his/her job is what data is available and where
it is in the data warehouse. In other words, when the DSS analyst receives an
assignment, the first thing the DSS analyst needs to know is what data there is that
might be useful in fulfilling the assignment. To this end the metadata for the warehouse
is vital to the preparatory work done by the DSS analyst.
Contrast the importance of the metadata to the DSS analyst to the importance of
metadata to the information technology professional. The information technology
professional has been doing his/her job for many years while treating metadata
passively.

Você também pode gostar