Você está na página 1de 31

ETL with SSIS 101

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


1
whoami? aka shameless plugs

 Donabel Santos
 SQL Geek (I love SQL Server!), PowerShell
enthusiast
 Principal, QueryWorks Solutions
 Training, Consulting
 Instructor, BC Institute of Technology
 Blogger – sqlmusings.com
 Tweep – @sqlbelle
 Author – some articles + upcoming
PowerShell/SQL Server book (summer 2012)
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca
2
Agenda

 What ETL is
 SSIS Refresher
 Demo, demo, demo
 SSIS Tips

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


3
Materials

 Presentation materials will be at my blog


 http://www.sqlmusings.com
 Demo files
 Slides
 Questions? Comments? Email me
 donabel.santos@queryworks.ca

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


4
ETL

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


5
Extract Transform Load (ETL)

 A way to bridge “source” systems and


“destination” systems
 Mostly used for cleaning data and loading
data warehouse

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


6
Lifecycle: Extract, Transform, Load

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


7
Data Warehousing ETL
 Understanding the data – what needs to be captured and reported
 Also understanding your own Data Quality
 Naming conventions
 Schemas would also be helpful (stg, etl, tmp, dwh)
 Auditing
 When packages last run, when records are last updated etc
 NULL Values
 Not acceptable in a DW
 Will cause loss of data when joins are done
 Maps to an “Unknown” default value in the dimensions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


8
Data Warehousing ETL
 Staging
 Good for auditing, recoverability, releasing sources (tables, files
etc)
 Checksum
 Allows you to easily figure out if any records have changed
 Helpful for Slowly Changing Dimensions (SCD)
 Surrogate Key (SK) vs Business Key/Alternate Key (BK or AK)
 SK is a “meaningless” number that is the PK in the DW
 Faster; no dependencies in source keys; avoids collision
 BK or AK is the PK in the source

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


9
Data Warehousing ETL – Dealing with Changes
 Kimball – Slowly Changing Dimensions (SCD)
 SCD Type 1
 Only keep last value
 How: UPDATE
 When to use: errors, don’t want to keep history, don’t have a lot
of storage

SK FName City
1 Jamie Vancouver

SK FName City
1 Jamie Richmond

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


10
Data Warehousing ETL – Dealing with Changes
 Kimball – Slowly Changing Dimensions (SCD)
 SCD Type 2
 Keep all historical values
 HOW: Insert new records, update start and end dates
 When to use: you want to keep history

SK FName City StartDate EndDate


1 Jamie Vancouver 2012-01-01 2012-03-10
2 Jamie Richmond 2012-03-01 NULL

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


11
Data Warehousing ETL – Dealing with Changes
 Kimball – Slowly Changing Dimensions (SCD)
 SCD Type 3
 Keep only current, plus last value before current
 How: one more column that has previous value

SK FName PreviousCity EffectiveDate CurrentCity


1 Jamie Vancouver 2012-03-10 Richmond

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


12
SSIS REFRESHER

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


13
BIDS Tour

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


14
SSIS Components – Data Flow
Source Transformation Destination

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


15
Common Transformations

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


16
Variables

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


17
Expressions

 Allow you to have more dynamic (and


powerful) SSIS packages
 Allow you to set properties at runtime
 “Configurable” properties depend on which
component you’re working with

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


18
Expressions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


19
Expressions
 Check if field is null
 ISNULL(RegistrationDate)
 Checks if SellDate is June or later
 DATEPART("mm",SellStartDate) > 6
 Calculates markup
 ROUND(ListPrice / StandardCost,2) * 100
 Checks if product is a manufactured finished good and size is not
a specified bike size
 MakeFlag == TRUE && FinishedGoodsFlag == TRUE && Size !=
(DT_WSTR,10)@BikeSize
 Ternary Operator
 UPPER(SUBSTRING(Color,1,1)) + LOWER(SUBSTRING(Color,2,15))
 File with today’s date
 "C:\\SSIS\\File_" + (DT_WSTR, 30)(DT_DBDATE)GETDATE() + ".txt"

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


20
Our mission for today, should we choose to accept it …

MISSION POSSIBLE

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


21
Sources
Excel 2007

Access 2007

XML
MySQL

Text File

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


22
Mission Accomplished

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


23
GENERAL SSIS TIPS

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


24
Tip: Package Configuration Files

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


25
Tip: EncryptSensitiveWithUserKey

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


26
Tip: Credentials and Proxies

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


27
What we covered

 What ETL is
 SSIS Refresher
 Demo, demo, demo
 SSIS Tips

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


28
ETL/SSIS Resources
 BI Developer Network
 http://www.bidn.com/
 Jamie Thomson – SSIS Junkie
 http://sqlblog.com/blogs/jamie_thomson/
 Pragmatic Works
 http://pragmaticworks.com/
 Matt Masson – SSIS Team Blog
 http://blogs.msdn.com/b/mattm/
 Andy Leonard
 http://sqlblog.com/blogs/andy_leonard/

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca


29
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me donabel.santos@queryworks.ca
30
Presentation materials will be at my blog
http://www.sqlmusings.com
Demo files
Slides
Cheatsheets
Questions? Comments? Email me
donabel.santos@queryworks.ca

THANK YOU!

Você também pode gostar