Escolar Documentos
Profissional Documentos
Cultura Documentos
The method that we are going to consider here assumes the fact that, not all the attributes of a
dimension table are rapidly changing in nature. There might be a few attributes which are changing
quite often and some other attributes which seldom change. If we can separate the fast changing
attributes from the slowly changing ones and move them in some other table while maintaining the
slowly changing attributes in the same table, we can get rid of the issue of bulking up the dimension
table.
So lets take one example to see how it works. Lets say CUSTOMER dimension has following columns:
CUSTOMER_KEY
CUSTOMER_NAME
CUSTOMER_GENDER
CUSTOMER_MARITAL_STATUS
CUSTOMER_TIER
CUSTOMER_STATUS
While attributes like name, gender, marital status etc. do not change at all or rarely change, lets
assume customer tier and status change every month based on customers buying pattern. If we
decide to keep status and tier in the same SCD Type 2 Customer dimension table, we could risk fillingup the table too much too soon. Instead, we can pull out those two attributes in yet another table,
which some people refer as JUNK DIMENSION. Here is how our junk dimension will look like. In this
case, it will have 3 columns as shown below.
SEGMENTATION_KEY
TIER
STATUS
The column SEGMENTATION_KEY is a surrogate key. This acts as the primary key of the table. Also
since we have removed status and tier from our main dimension table, the dimension table now looks
like this:
CUSTOMER_KEY
CUSTOMER_NAME
CUSTOMER_GENDER
CUSTOMER_MARITAL_STATUS
Next, we must create a linkage between the above customer dimension to our newly created JUNK
dimension. Note here, we can not simply pull the primary key of the JUNK dimension (which we are
calling as SEGMENTATION_KEY) into the customer dimension as foreign key. Because if we do so, then
any change in JUNK dimension will require us to create a new record in Customer dimension to refer to
the changed key. This would in effect again increase the data volume of the dimension table. We solve
this problem by creating one more mini table in between the original customer dimension and the junk
dimension. This mini dimension table acts as a bridge between them. We also put start date and
end date columns in this mini table so that we can track the history. Here is how our new mini table
looks like:
CUSTOMER_KEY
SEGMENTATION_KEY
START_DATE
END_DATE
This table does not require any surrogate key. However, one may include one CURRENT FLAG column
in the table if required. Now the whole model looks like this:
Junk Dimension
In data warehouse design, frequently we run into a situation where there are
yes/no indicator fields in the source system. Through business analysis, we
know it is necessary to keep such information in the fact table. However, if
keep all those indicator fields in the fact table, not only do we need to build
many small dimension tables, but the amount of information stored in the fact
table also increases tremendously, leading to possible performance and
management issues.
Junk dimension is the way to solve this problem. In a junk dimension, we
combine these indicator fields into a single dimension. This way, we'll only
need to build a single dimension table, and the number of fields in the fact
table, as well as the size of the fact table, can be decreased. The content in
the junk dimension table is the combination of all possible values of the
individual indicator fields.
Let's look at an example. Assuming that we have the following fact table:
In this example, the last 3 fields are all indicator fields. In this existing format,
each one of them is a dimension. Using the junk dimension principle, we can
combine them into a single junk dimension, resulting in the following fact
table:
Note that now the number of dimensions in the fact table went from 7 to 5.
The content of the junk dimension table would look like the following:
In this case, we have 3 possible values for the TXN_CODE field, 2 possible
values for the COUPON_IND field, and 2 possible values for the
PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk
dimension table.
By using a junk dimension to replace the 3 indicator fields, we have
decreased the number of dimensions by 2 and also decreased the number of
fields in the fact table by 2. This will result in a data warehousing environment
that offer better performance as well as being easier to manage.