Escolar Documentos
Profissional Documentos
Cultura Documentos
Page 1 sur 5
Kimball Forum
Hi,
We have the following case:
For the product code in the source data, lookup the product natural key in the product
dimension and assign the surrogate key returned by the lookup to the fact. If the
lookup fails, assign the unknown product whose natural key is "Unknown". The
unknown product is entered initially in the dimension and should never change
afterwards, we use negative SID for default entries (-1 for "unknown", -2 for "not
applicable" and -3 for "not available").
In the ETL design, there is a debate on how to implement this, some want to hard code
the SID for all dimension with default values (ex: if the lookup fails put the value -1 in
the key field). However, I have doubts since the ETL has to validate at all times that
the DB integrity is not compromised (integrity is turned off in the DB). My opinion is
that the "Unknown" dimension entry should always be retrieved for a "default" lookup
in the ETL to verify this integrity. This would result in the following ETL:
1. Lookup the product SID in the product dimension using the natural key
2. If the product is found, assign the product SID to the fact
3. Else lookup the default SID using the "Unknown" natural key and assign the SID
retrieved. (a similarl lookup can be done for "not applicable" or "not available)
This method works for all dimensions and guarantees the integrity of the database.
This also protects against "hard coding" a surrogate key which doesn't sound too good
imho... Even if the default keys will always be -1, -2 and -3 we should not use this
values directly.
Any thoughts on this topic?
Thanks in advance,
Mathieu
MLaforge
Posts: 2
Join date: 2010-04-07
http://forum.kimballgroup.com/t486-assign-default-value-for-a-dimension
2012-04-23
Page 2 sur 5
For the product code in the source data, lookup the product natural key in the
product dimension and assign the surrogate key returned by the lookup to the fact.
You lookup the product natural key.... so how is it the natural key is unknown?
If you cannot find a natural key in a dimension table you insert a new row WITH THAT
NATURAL KEY. You do not throw it away and use some generic value. Now, you may
set attributes, such as product description, to 'Unknown', but you need to retain the
identity of the dimension row at all costs.
You need to ask yourself why would such a situation occur... assuming the application
system has a product master that it refers to in transactions? It usually happens
because something went wrong with the dimension update (job failed or wasn't run).
You need to retain the product identity in the dimensions so that if and when the
source system product master is updated, it would properly populate dimension
attributes for that row.
ngalemmo
Posts: 1687
Join date: 2009-05-15
Location: Los Angeles
http://forum.kimballgroup.com/t486-assign-default-value-for-a-dimension
2012-04-23
Page 3 sur 5
Mathieu
MLaforge
Posts: 2
Join date: 2010-04-07
The only time you may hard-code a surrogate key reference is in the case of a NULL
natural key, but from an ETL standpoint it is easier to simply hard code a default
natural key value when you encounter nulls and allow the normal surrogate key lookup
process to handle it.
But the point I am trying to make is if you receive a transaction that deals with product
"XYZ" and you don't know what "XYZ" is (i.e. it is not in the dimension table), you need
to create an entry for "XYZ" so, at the very least, you know what product ID the
numbers refer to. Without that, how do you reconcile things if there are 10 different
facts for different unknown products all pointing to the same dimension row?
ngalemmo
Posts: 1687
Join date: 2009-05-15
Location: Los Angeles
There is no "one size fits all" for solving this problem. But hard-coding a default value
(e.g. -1) is near the bottom of my list. Here is my list of potential solutions to the
problem of how to handle RI failures between fact and dimension:
Halt processing and abort. Hard to imagine this is a good idea.
Throw away the bad fact row. I can imagine a handful of scenarios where this
could be what you want to do, but rarely.
Divert the fact row to an error table. This is a common, simple technical solution.
It should be implemented ONLY if you develop policies and procedures for getting
the bad fact rows out of jail.
http://forum.kimballgroup.com/t486-assign-default-value-for-a-dimension
2012-04-23
Page 4 sur 5
Joy
Posts: 19
Join date: 2009-02-03
Location: Kimball
Group
If you are not able to update dimension table, you can create one more reference table
which can be inserted and updated and will hold negative running number (like
surogate key) for each product which was not found in dimension table. You need to
follow below steps
1. lookup on existing product dimension table if not found then lookup on newly added
refence table, If lookup failed with respect to new reference table then insert one
record with negative running number and product information.
2. When next time same product will come ( which not found in dimension table) that
will be found in reference lookup table and negative running number will be passed to
fact table.
This solution has 3 benifits
1. Allows every data load into fact table.
2. No need to use single default value which leads confusion in aggregation.
3. No data integrity issues in fact table.
While reprots you can take union of product dimension and new reference lookup table
and then join with fact to get full data.
http://forum.kimballgroup.com/t486-assign-default-value-for-a-dimension
2012-04-23
Page 5 sur 5
Cheers,
Sachin
sac587118
Posts: 2
Join date: 2010-12-10
Similar topics
Assign default value for a dimension
Dummy Row in Dimension Tables
SSAS Cube structure of SCD Type 2 dimension
Date Dimension
Dimension table with no business key?
Home
Forum free | phpBB | Free forum support | Report an abuse | Free forums
http://forum.kimballgroup.com/t486-assign-default-value-for-a-dimension
2012-04-23