Você está na página 1de 7

CPSC 404 Tutorial # 2

Akiff Manji - x8n4


Evan Ng - s2o7
Question #1 to Answer and Hand In:
Why is it important to avoid FK violations when setting up a data warehouse? After all, a DW
isnt a real-time, OLTP, source systemso why bother with FKs?
Data warehouses often aggregate data from multiple sources, some of which can be
operational OLTP databases themselves. As a result ETL effort may be substantial when
dealing with OLTP sources. It is important to handle conflicting schemas, differences in
semantics, platforms and integrity constraints.
Furthermore were populating tables (i.e. loading data) from the SampleCurrencyData.txt flat
file that references data tables from the AdventureWorksDW database in SSMS. Therefore
the data from the flat file must obey OLTP integrity constraints or we run the risk of
querying and loading incorrect data to/from the data warehouse, or worse yet, failing to
load anything at all.
Question #2 to Answer and Hand In (2 Parts):
i. What is the purpose of this SQL query?
The purpose of this SQL query is to get all tuples of dbo.DimCurrency in AdventureWorksDW
whose CurrecyAlternateKey matches any of the three letter currency acronyms. Essentially
were only interested in the rows corresponding to the 14 different currencies in the select
clause rather than all 105 in the dbo.DimCurrency table.
It works in two steps: First, all tuples of the dbo.DimCurrency table are returned and is given
an alias, refTable. From this, the rows matching the CurrencyAlternateKey acronyms are
selected.
ii. What is the purpose of specifying refTable in the following SQL query? Is it needed?
It just renames dbo.DimCurrency to refTable so we do not have to keep on typing it in the
where clause of the select statement. So no, it is not needed. However, it also provides a
nice placeholder for the DBA executing the query so they know that dbo.DimCurrency is the
table that theyre referencing from.
We also observed that the nested select was unnecessary as we could have just kept the
inner select statement and retrieved the same results.

Question #3 to Answer and Hand In:


In a paragraph or two, in plain English (for a non-DB person), summarize what you did in the
last two steps (for CurrencyKey and TimeKey).
In the last two steps, we first read in the Flat File Source for Currency (i.e. extracted the
data). The flat file contains fields such as CurrencyID and CurrencyDate that can be mapped
to columns that exist in tables in AdventureWorksDW. We use the mapping terms to get
foreign keys for the dbo.DimCurrency and dbo.DimTime tables for the respective currencies
and time stamps in the flat file (i.e transformed the data). We will use these to populate
other destination tables (i.e. FactCurrencyRate) of AdventureWorksDW that can then
reference the existing table data in AdventureWorksDW.
Question #4 to Answer and Hand In (4 Short Parts):
i. How many rows are in the FactCurrencyRate table?
1097
ii. What real-life data is referred to by CurrencyKey #100?
Currency key is the foreign key to a currency in dbo.DimCurrency. It is just a numeric key
referencing the currency name US dollars, or currency alternate key USD.
iii. How about the TimeKey: What does a TimeKey of 1, 2, etc. refer to?
Time Key references the exact date of a calendar year in dbo.DimTime. In particular it
references the the particular date, numbered day of week, name of weekday, numbered day
of the month, name of the month, numbered day of the year, numbered year, numbered
day of the week, etc.
iv. In plain English, writing for a non-DB person, explain what row 1 of the
FactCurrencyTable tells us.
Row 1 tells us what the US Dollars (USD) average and end of day exchange rate was for the
day of 2001-07-01, (a particular date in the calendar).
Question #5
[OLE DB Destination for DimDate [153]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE
DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult:
0x80004005 Description: "The statement has been terminated.".
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult:
0x80004005 Description: "Cannot insert the value NULL into column 'MonthNumber', table
'OLAP_Target.dbo.DimDate'; column does not allow nulls. INSERT fails.".

Question #6
If you try to deploy the ETL process a second time, what error messages do you get? (It is
OK to do this, but verify that all 1188 rows are still there.)
[OLE DB Destination for DimProduct [127]] Error: SSIS Error Code DTS_E_OLEDBERROR. An
OLE DB error has occurred. Error code: 0x80040E2F.
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult:
0x80040E2F Description: "The statement has been terminated.".
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult:
0x80040E2F Description: "Violation of PRIMARY KEY constraint 'PK_DimProduct'. Cannot
insert duplicate key in object 'dbo.DimProduct'.".
Expanding the tables folder in OLAP_Target in SSMS, we can right click on dbo.DimDate ->
go to Properties -> click on the Storage option on the left-hand menu of the popup -> we
see that there are still 1188 rows in the table.
Question #7:
List all the foreign keys for the fact table, and indicate the reference column
(candidate key) and dimension table that corresponds to each FK.

Foreign Key

Candidate Key

Dimension Table

ShipDateKey

DateKey

dbo.DimDate

OrderDateKey

DateKey

dbo.DimDate

DueDateKey

DateKey

dbo.DimDate

CustomerKey

CustomerKey

dbo.DimCustomer

ProductKey

ProductKey

dbo.DimProduct

Question #8
Define the terms measure and measure group according to the way that they are used in
SQL Server. Specifically indicate whether or not they apply to fact tables , dimensions,
cubes, and other objects.
A measure represents a column in the cube that contains quantifiable data, usually numeric
that can be aggregated. Every cube must have at least one measure but most cubes have
many measures. Structurally, a measure is often mapped to a source column in the fact

table, with the column providing the values used to load the measure. Measures are
context-sensitive, operating on numeric data in a context that is determined by whichever
dimension members happen to be included in the query.
Measures are grouped by their underlying fact tables into measure groups. Measure groups
are used to associate dimensions with measures. Measure groups are also used for
measures that have distinct counts as their aggregation behavior. Placing each distinct count
measure into its own measure group optimizes aggregation processing.
Question #9
What is meant by the phrase deploy and process a cube ? In particular, what major events
take place in Microsofts SSAS when doing so? One or two paragraphs are fine.
Deploying an Analysis Services project creates the defined objects in an instance of Analysis
Services. In other words, once the structure of the cube is created in BIDS, deploying the
cube creates the actual structure of it on an SQL analysis server. Processing the objects in an
instance of Analysis Services involves copying the data from the underlying data sources into
the cube objects. In other words, once the cube is deployed onto the SSAS server the data
needs to be loaded onto the structure and the aggregations need to be generated.
Question #10 to Answer and Hand In:
What is the purpose of creating a hierarchy such as Customer Geography? After all, we
already have all of its attributes in the list of attributes, and we should be able to access any
of these fields without problem.
Hierarchies, in tabular models, are metadata that define relationships between two or more
columns in a table. Hierarchies can appear separate from other columns in a reporting
client field list, making them easier for client users to navigate and include in a report.
Hierarchies aid to improve the overall user experience;
Hierarchies can provide a simple, intuitive view of an otherwise complex data
structure. For example, in our case , we have created a logical relationship from
Country-Region -> State-Province -> City.
Hierarchy levels can make it easier for users to find and include levels in a
report. Renaming a level does not rename the column it references; it simply makes
the level more identifiable.

Question #11 to Answer and Hand In (2 Parts):


i. What is the purpose of a Named Calculation in SQL Server?
A Named Calculation is an SQL expression represented as a calculated column. This
expression appears and behaves as a column in the table. A named calculation lets you
extend the relational schema of existing tables or views in a data source view without
modifying the tables or views in the underlying data source. It also allows one to quickly
specify new columns in the cube that are more informative than what the underlying data
table provides.
ii. Why dont we just change the underlying source table instead of creating a Named
Calculation?
This would require us to re-deploy the cube (a time constraint if the data is large enough),
and in some cases even rebuild the cube altogether. Moreover, it would introduce
redundancy and require extra storage space in the original dataset that may not have been
intended to be present in the first place.
Question #12 to Answer and Hand In:
What is the purpose of including the customers name in a geography hierarchy? This
seems to have very little to do with geography.
Just like adding other attributes to the hierarchies, it allows the user to define a relationship
between a customer and where they live. This may be important for example when the user
wants to quickly determine the geographical location of their customers.
Question #13:
Explain what advantage is given by creating a Display Folder for the above situations.
It groups relevant attributes into a folder that can subsequently be accessed in the cubes
dimension browser. If a dimension contains many attributes, this improves the user
experience if they are looking for attributes that fall under the same general category, but
aren't necessarily organized in a hierarchical structure. The alternative is sifting through
each of the attributes sorted in alphabetical order.
Question #14 to Answer and Hand In (2 Parts):
i. We just defined a rigid example, but give a good example of a field that benefits from
being flexible.
Notice that we kept the Full Name attribute as flexible since a customer may move to
another location over time. However its safe to assume that a particular City will always
have a relation to some Province or State and these in turn will always have a relationship to
some country.

Another example of a field that benefits from being flexible is Occupation. A customer is
likely to change their occupation over time so if we defined a linear relationship between
occupation and name, for example it would be more appropriate to keep these flexible.
ii. How does the specification of flexible vs. rigid help with data warehouse performance?
When a cube is deployed, flexible aggregations are destroyed and subsequently rebuilt
(while rigid aggregations are not). Not having to rebuild all aggregations equates to a
reduction in deployment time.
Question #15 to Answer and Hand In:
What is the difference between a member and an attribute in SSAS?
Attributes correspond to a collection of members in the dimension where the data is
queried from. Members represent the values that the attribute can take on, or the discrete
values that are contained in the dimension which the attributes are defined.
Question #16 to Answer and Hand In:
In plain English, explain the purpose of the DATENAME function.
The purpose of the DATENAME function is to convert a timestamp to a human readable
readable format.
Question #17 to Answer and Hand In:
Simplify the previous CalendarSemesterDesc script to be as short as possible (except for the
whitespace which is for the DBAs benefit).
'H' + CONVERT(CHAR(4), CalendarSemester) + ' ' + 'CY' + ' ' + CONVERT(CHAR(4),
CalendarYear)
Question#18 to Answer and Hand In:

Here were asking the basic question what education level and for which country are users
making the most purchases online. Given that individuals are increasingly turning to online
marketplaces this is a great avenue to determine which demographic to market to or where
marketing is perhaps not as effective. It appears that individuals with at least some form of

postsecondary education (such as a bachelors) tend to make the most purchases online.
Interestingly the highest number comes from Australia. We've drilled down our search to
education level and country. We also added a filter to determine whether these purchasers
are predominantly male, female, other, or perhaps none of them.

Você também pode gostar