Escolar Documentos
Profissional Documentos
Cultura Documentos
This eBook does not provide you with legal rights to the ownership of Microsoft products, but just the
use, unless this is explicitly stated in the document.
You can copy and use this whitepaper for your projects, labs and other needs.
Ed Price
Gokan Ozcifci
Durval Ramos
Naomi Nosonovsky
Saeid Hasani
What is TechNet WIKI?
The TechNet Wiki is a library of information about Microsoft technologies, written by the community for
the community. Whether you write code, manage servers, keep mission-critical sites up and running, or
just enjoy digging into details, we think you will be at home in the TechNet Wiki.
This is a community site. For official documentation, see MSDN Library, TechNet Library or
contact Microsoft Support
The Wiki is focused on Microsoft technologies. The community will edit or remove topics that
get too far off track
We are inspired by Wikipedia
Anyone who joins can participate and contribute content
The simplest way to participate is to use the information in this Wiki. The community is providing howto
guides, troubleshooting tips and techniques, practical usage scenarios, scripting pointers as well as
We welcome your feedback. Head over to the TechNet Wiki Discussion forum, connect with us on the
Microsoft Certified IT
Professional in SQL Server 2005
Database Developer
Naomi Nosonovsky IT professional with more than
15 years of experience in
variety of programming
languages and technologies.
This book was put together by Saeid Hasani with the help of Durval Ramos Junior, Naomi Nosonovsky and
Ronen Ariely (aka pituach).
The Editors of this eBook thank all TechNet Wiki Members who contributed their content into Microsoft
TechNet Wiki.
Contents
T-SQL USEFUL LINKS ............................................................................................................................................. 20
SELECT TOP N ROWS PER GROUP ....................................................................................................................................20
PERFORMANCE OPTIMIZATION .......................................................................................................................................20
EXECUTE VS SP_EXECUTESQL ......................................................................................................................................20
SQL SERVER INTERNALS ................................................................................................................................................20
DYNAMIC SEARCH ........................................................................................................................................................21
OPTION RECOMPILE......................................................................................................................................................21
DATES ........................................................................................................................................................................21
CALENDAR TABLE .........................................................................................................................................................21
GAPS AND ISLANDS .......................................................................................................................................................21
CONCURRENCY ............................................................................................................................................................21
PARAMETER SNIFFING ...................................................................................................................................................22
CURSORS ....................................................................................................................................................................22
INFORMATION ABOUT ALL OBJECTS ..................................................................................................................................22
STRING MANIPULATIONS ...............................................................................................................................................22
STRING SPLIT ...............................................................................................................................................................22
XML .........................................................................................................................................................................22
CONCATENATE ROWS ...................................................................................................................................................22
COMMON TABLE EXPRESSION.........................................................................................................................................23
CTE PERFORMANCE .....................................................................................................................................................23
CTE SYNTACTIC SUGAR ..................................................................................................................................................23
CTE VERSUS TEMP TABLE ..............................................................................................................................................23
PIVOT .......................................................................................................................................................................23
UNPIVOT ..................................................................................................................................................................23
RUNNING TOTAL ..........................................................................................................................................................23
ASP.NET ...................................................................................................................................................................23
OTHER TOPICS .............................................................................................................................................................24
HIERARCHICAL TABLE SORTING WITH A PARENT-CHILD RELATION ...................................................................... 26
PROBLEM ...................................................................................................................................................................26
SOLUTION ...................................................................................................................................................................26
APPLY OPERATOR IN SQL SERVER ........................................................................................................................ 30
INTRODUCTION ............................................................................................................................................................30
APPLY OPERATORS ......................................................................................................................................................30
USING THE CODE .........................................................................................................................................................30
TOP OPERATOR ..........................................................................................................................................................31
T-SQL: APPLYING APPLY OPERATOR .................................................................................................................... 32
PROBLEM DESCRIPTION .................................................................................................................................................32
SOLUTION ...................................................................................................................................................................33
SQL SERVER 2012 SOLUTION ........................................................................................................................................35
CONCLUSION ...............................................................................................................................................................35
ADDENDUM ................................................................................................................................................................35
FIXING MISSING DATA BASED ON PRIOR ROW INFORMATION ............................................................................ 36
SQL SERVER PIVOT ............................................................................................................................................... 39
PROBLEM DEFINITION ...................................................................................................................................................39
COMMON PROBLEM .....................................................................................................................................................40
OTHER BLOGS .............................................................................................................................................................40
T-SQL: DISPLAY HORIZONTAL ROWS VERTICALLY ................................................................................................ 41
HOW TO DISPLAY DYNAMICALLY HORIZONTAL ROWS VERTICALLY............................................................................................41
T-SQL: DYNAMIC PIVOT ON MULTIPLE COLUMNS ................................................................................................ 44
HOW TO MAKE A DYNAMIC PIVOT ON MULTIPLE COLUMNS .................................................................................................44
ADDITIONAL RESOURCES ...............................................................................................................................................46
T-SQL: CREATE REPORT FOR LAST 10 YEARS OF DATA .......................................................................................... 47
PROBLEM DEFINITION ...................................................................................................................................................47
SOLUTION ...................................................................................................................................................................49
CONCLUSION ...............................................................................................................................................................50
T-SQL: RELATIONAL DIVISION .............................................................................................................................. 52
INTRODUCTION ............................................................................................................................................................52
PROBLEM DEFINITION ...................................................................................................................................................52
SOLUTIONS .................................................................................................................................................................52
BEST EXACT MATCH SOLUTION .......................................................................................................................................56
SLIGHT VARIATION OF THE ORIGINAL PROBLEM..................................................................................................................57
CONCLUSION ...............................................................................................................................................................60
MICROSOFT SQL SERVER 2012 NEW FUNCTIONS ................................................................................................. 62
EOMONTH ...............................................................................................................................................................62
CHOOSE ...................................................................................................................................................................62
CONCAT ...................................................................................................................................................................62
LAST_VALUE AND FIRST_VALUE ...............................................................................................................................62
LEAD ........................................................................................................................................................................63
EOMONTH() FUNCTION USAGE IN SQL SERVER 2012 AND ON ............................................................................. 64
HOW SQL SERVER DETERMINES TYPE OF THE CONSTANT .................................................................................... 66
PROBLEM DEFINITION ...................................................................................................................................................66
EXPLANATION ..............................................................................................................................................................66
CONCLUSION ...............................................................................................................................................................66
UNDERSTANDING NOLOCK QUERY HINT ............................................................................................................. 67
SET ANSI_PADDING SETTING AND ITS IMPORTANCE ........................................................................................... 73
PROBLEM DESCRIPTION .................................................................................................................................................73
INVESTIGATION ............................................................................................................................................................73
RESOLUTION ...............................................................................................................................................................73
SCRIPT TO CORRECT PROBLEM IN THE WHOLE DATABASE ......................................................................................................75
DEFAULT DATABASE SETTINGS ........................................................................................................................................76
ALL-AT-ONCE OPERATIONS IN T-SQL .................................................................................................................... 77
INTRODUCTION ............................................................................................................................................................77
DEFINITION .................................................................................................................................................................78
PROS AND CONS ..........................................................................................................................................................81
CAUTION .................................................................................................................................................................83
EXCEPTION..................................................................................................................................................................89
CONCLUSION ...............................................................................................................................................................89
SQL SERVER COLUMNSTORE INDEX FAQ .............................................................................................................. 90
CONTENTS ..................................................................................................................................................................90
1. OVERVIEW ..............................................................................................................................................................90
2. CREATING A COLUMNSTORE INDEX ..............................................................................................................................91
3. LIMITATIONS ON CREATING A COLUMNSTORE INDEX ...............................................................................................95
4. MORE DETAILS ON COLUMNSTORE TECHNOLOGY ...........................................................................................................95
5. USING COLUMNSTORE INDEXES...................................................................................................................................99
6. MANAGING COLUMNSTORE INDEXES .........................................................................................................................102
7. BATCH MODE PROCESSING ......................................................................................................................................107
SQL SERVER COLUMNSTORE PERFORMANCE TUNING ....................................................................................... 110
INTRODUCTION ..........................................................................................................................................................110
FUNDAMENTALS OF COLUMNSTORE INDEX-BASED PERFORMANCE ......................................................................................110
DOS AND DON'TS FOR USING COLUMNSTORES EFFECTIVELY .............................................................................................111
MAXIMIZING PERFORMANCE AND WORKING AROUND COLUMNSTORE LIMITATIONS ..............................................................112
ENSURING USE OF THE FAST BATCH MODE OF QUERY EXECUTION.......................................................................................112
PHYSICAL DATABASE DESIGN, LOADING, AND INDEX MANAGEMENT ....................................................................................112
MAXIMIZING THE BENEFITS OF SEGMENT ELIMINATION .....................................................................................................112
ADDITIONAL TUNING CONSIDERATIONS ..........................................................................................................................112
T-SQL: SIMPLIFIED CASE EXPRESSION ................................................................................................................ 114
INTRODUCTION ..........................................................................................................................................................114
DEFINITION ...............................................................................................................................................................114
DETERMINE OUTPUT DATA TYPE ....................................................................................................................................117
DETERMINE OUTPUT NULL-ABILITY ...............................................................................................................................118
PERFORMANCE ..........................................................................................................................................................125
IS NULL AND OR.......................................................................................................................................................125
CASE ......................................................................................................................................................................126
COALESCE ..............................................................................................................................................................127
ISNULL ...................................................................................................................................................................128
DYNAMIC SQL...........................................................................................................................................................129
COALESCE ..............................................................................................................................................................131
ISNULL ...................................................................................................................................................................132
XML .......................................................................................................................................................................132
CHOOSE .................................................................................................................................................................133
UDF FUNCTION .........................................................................................................................................................134
PERMANENT LOOKUP TABLE ........................................................................................................................................135
MORE READABILITY ....................................................................................................................................................136
CONCLUSION .............................................................................................................................................................137
STRUCTURED ERROR HANDLING MECHANISM IN SQL SERVER 2012 .................................................................. 138
PROBLEM DEFINITION..................................................................................................................................................138
INTRODUCTION ..........................................................................................................................................................138
SOLUTION .................................................................................................................................................................138
CORRECT LINE NUMBER OF THE ERROR! ..........................................................................................................................147
EASY TO USE..............................................................................................................................................................148
COMPLETE TERMINATION.............................................................................................................................................148
INDEPENDENCE OF SYS.MESSAGES .................................................................................................................................150
XACT_ABORT .........................................................................................................................................................155
@@TRANCOUNT ...................................................................................................................................................155
CONCLUSION .............................................................................................................................................................155
ERROR HANDLING WITHIN TRIGGERS USING T-SQL ........................................................................................... 156
PROBLEM DEFINITION..................................................................................................................................................156
SOLUTION .................................................................................................................................................................157
CONCLUSION .............................................................................................................................................................161
CUSTOM SORT IN ACYCLIC DIGRAPH ................................................................................................................. 162
PROBLEM DEFINITION..................................................................................................................................................162
VOCABULARY ............................................................................................................................................................162
SOLUTION .................................................................................................................................................................162
PATINDEX CASE SENSITIVE SEARCH ................................................................................................................... 166
REMOVE LEADING AND TRAILING ZEROS ........................................................................................................... 167
T-SQL: HOW TO FIND ROWS WITH BAD CHARACTERS........................................................................................ 168
CONCLUSION.............................................................................................................................................................170
RANDOM STRING .............................................................................................................................................. 171
INTRODUCTION ..........................................................................................................................................................171
SOLUTIONS ...............................................................................................................................................................171
CONCLUSIONS AND RECOMMENDATIONS ........................................................................................................................178
SORT LETTERS IN A PHRASE USING T-SQL .......................................................................................................... 180
PROBLEM DEFINITION..................................................................................................................................................180
INTRODUCTION ..........................................................................................................................................................180
SOLUTION .................................................................................................................................................................180
LIMITATIONS .............................................................................................................................................................182
T-SQL: DATE-RELATED QUERIES ......................................................................................................................... 184
FINDING DAY NUMBER FROM THE BEGINNING OF THE YEAR ...............................................................................................184
FINDING BEGINNING AND ENDING OF THE PREVIOUS MONTH .............................................................................................184
HOW TO FIND VARIOUS DAY, CURRENT WEEK, TWO WEEK, MONTH, QUARTER, HALF YEAR AND YEAR IN SQL
SERVER .............................................................................................................................................................. 185
DATE COMPUTATION ..................................................................................................................................................185
FINDING CURRENT DATE..............................................................................................................................................185
FINDING START DATE AND END DATE OF THE WEEK .........................................................................................................185
FINDING END DATE OF THE WEEK .................................................................................................................................185
FINDING START DATE AND END DATE OF THE TWO WEEKS ................................................................................................186
FINDING START DATE AND END DATE OF THE CURRENT MONTH .........................................................................................186
FINDING START DATE AND END DATE OF THE CURRENT QUATER .........................................................................................187
FINDING START DATE AND END DATE FOR HALF YEAR.......................................................................................................187
FINDING START DATE AND END DATE FOR YEAR ..............................................................................................................188
SQL SERVER: HOW TO FIND THE FIRST AVAILABLE TIMESLOT FOR SCHEDULING................................................ 189
CREATE SAMPLE DATA ................................................................................................................................................189
T-SQL: GROUP BY TIME INTERVAL...................................................................................................................... 191
SIMPLE PROBLEM DEFINITION ......................................................................................................................................191
SOLUTION .................................................................................................................................................................191
COMPLEX PROBLEM DEFINITION AND SOLUTION ..............................................................................................................191
AVOID T (SPACE) WHILE GENERATING XML USING FOR XML CLAUSE ................................................................ 193
GENERATE XML WITH SAME NODE NAMES USING FOR XML PATH .................................................................... 195
GENERATE XML - COLUMN NAMES WITH THEIR VALUES AS TEXT() ENCLOSED WITHIN THEIR COLUMN NAME TAG
.......................................................................................................................................................................... 197
SQL SERVER XML: SORTING DATA IN XML FRAGMENTS ..................................................................................... 198
PROBLEM DEFINITION .................................................................................................................................................198
APPROACHES .............................................................................................................................................................198
PROBLEM SOLUTION ...................................................................................................................................................200
CONCLUSION .............................................................................................................................................................202
TERMINOLOGY ...........................................................................................................................................................202
HOW TO EXTRACT DATA IN XML TO MEET THE REQUIREMENTS OF A SCHEMA ................................................. 203
INTRODUCTION ..........................................................................................................................................................203
PROBLEM .................................................................................................................................................................203
CAUSES ....................................................................................................................................................................204
DIAGNOSTIC STEPS .....................................................................................................................................................204
BUILDING THE SCENARIO OF THE PROBLEM .....................................................................................................................204
SOLUTION .................................................................................................................................................................205
ADDITIONAL INFORMATION ..........................................................................................................................................206
CREDITS ...................................................................................................................................................................206
REFERENCES ..............................................................................................................................................................207
TECHNET LIBRARY ......................................................................................................................................................207
T-SQL SCRIPT TO UPDATE STRING NULL WITH DEFAULT NULL ........................................................................... 209
FIFO INVENTORY PROBLEM - COST OF GOODS SOLD ......................................................................................... 211
DIFFERENT METHODS OF CALCULATING COST OF GOODS SOLD IN THE INVENTORY CALCULATION ...............................................211
IMPLEMENTING FIFO COST OF GOODS SOLD IN OUR APPLICATION.......................................................................................211
CURRENT PROCEDURE TO CALCULATE COST OF GOODS ON HAND ........................................................................................215
FIFO COST OF GOODS SOLD ........................................................................................................................................223
THE COST OF GOODS SOLD FIFO PROCEDURE .................................................................................................................233
SUMMARY ................................................................................................................................................................252
T-SQL: GAPS AND ISLANDS PROBLEM ................................................................................................................ 253
PROBLEM DEFINITION .................................................................................................................................................253
SOLUTION .................................................................................................................................................................253
CRAZY TSQL QUERIES PLAY TIME ....................................................................................................................... 255
BACKGROUND ...........................................................................................................................................................255
PLAYING WITH JOIN & UNION ...................................................................................................................................255
UNION USING JOIN ..................................................................................................................................................255
INNER JOIN USING SUB QUERY ................................................................................................................................256
LEFT JOIN USING SUB QUERY & UNION ...................................................................................................................256
RIGHT JOIN WE CAN QUERY USING LEFT JOIN ..............................................................................................................257
FULL OUTER JOIN USING "LEFT JOIN" UNION "RIGHT JOIN" ....................................................................................257
FULL OUTER JOIN USING SUB QUERY & UNION .......................................................................................................257
PLAYING WITH NULL ..................................................................................................................................................258
ISNULL USING COALESCE .........................................................................................................................................258
COALESCE using ISNULL ................................................................................................................................258
PLAYING WITH CURSOR AND LOOPS ...............................................................................................................................258
CURSOR USING WHILE LOOP (WITHOUT USING CURSOR) ...................................................................................................258
REFERENCES & RESOURCES ..........................................................................................................................................260
REGEX CLASS...................................................................................................................................................... 262
SQL SERVER RESOURCE RE-BALANCING IN FAILOVER CLUSTER .......................................................................... 264
SQL SERVER: CREATE RANDOM STRING USING CLR ........................................................................................... 267
INTRODUCTION ..........................................................................................................................................................267
RESOURCES ...............................................................................................................................................................268
HOW TO COMPARE TWO TABLES DEFINITION / METADATA IN DIFFERENT DATABASES .................................... 270
T-SQL: SCRIPT TO FIND THE NAMES OF STORED PROCEDURES THAT USE DYNAMIC SQL ................................... 272
T-SQL SCRIPT TO GET DETAILED INFORMATION ABOUT INDEX SETTINGS .......................................................... 273
HOW TO CHECK WHEN INDEX WAS LAST REBUILT ............................................................................................. 277
SQL SCRIPT FOR REBUILDING ALL THE TABLES’ INDEXES ..................................................................................................277
HOW TO GENERATE INDEX CREATION SCRIPTS FOR ALL TABLES IN A DATABASE USING T-SQL.......................... 278
T-SQL: FAST CODE FOR RELATIONSHIP WITHIN THE DATABASE ......................................................................... 280
HOW TO CHECK THE SYNTAX OF DYNAMIC SQL BEFORE EXECUTION ................................................................. 281
USING BULK INSERT TO IMPORT INCONSISTENT DATA FORMAT (USING PURE T-SQL) ....................................... 284
INTRODUCTION ..........................................................................................................................................................284
THE PROBLEM............................................................................................................................................................284
OUR CASE STUDY .......................................................................................................................................................284
THE SOLUTION: ..........................................................................................................................................................285
STEP 1: IDENTIFY THE IMPORT FILE FORMAT ...................................................................................................................285
STEP 2: INSERT THE DATA INTO TEMPORARY TABLE ..........................................................................................................290
STEP 3: PARSING THE DATA INTO THE FINAL TABLE ............................................................................................................292
SUMMARY ................................................................................................................................................................293
COMMENTS ..............................................................................................................................................................294
RESOURCES ...............................................................................................................................................................295
CHAPTER 1:
This article will share collection of links in regards to various aspects in Transact-SQL language. Many of
these links come very handy answering various questions in SQL Server related forums.
Optimizing TOP N per Group Queries - blog by Itzik Ben-Gan explaining various optimization
ideas.
Including an Aggregated Column's Related Values - this blog presents several solutions of the
problem with explanations for each.
Including an Aggregated Column's Related Values - Part 2 - the second blog in the series with
use cases for the previous blog.
Performance Optimization
Speed Up Performance And Slash Your Table Size By 90% By Using Bitwise Logic - interesting
and novel blog by Denis Gobo.
Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code -
very impressive blog by Denis Gobo.
Slow in the Application, Fast in SSMS? - comprehensive long article by Erland Sommarskog.
Performance consideration when using a Table Variable - Peter Larsson article.
LEFT JOIN vs NOT EXISTS - performance comparison by Gail Shaw.
EXECUTE vs sp_ExecuteSQL
Avoid Conversions In Execution Plans By Using sp_executesql Instead of Exec - by Denis Gobo.
Changing exec to sp_executesql doesn't provide any benefit if you are not using parameters
correctly - by Denis Gobo.
Do you use ISNULL(...). Don't, it does not perform - short blog by Denis Gobo.
Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later) - long and
comprehensive article by Erland Sommarskog.
Catch All Queries - short blog by Gail Shaw.
Sunday T-SQL tip: How to select data with unknown parameter set - nice blog by Dmitri
Korotkevich.
Relevant MSDN forum's thread
Is this worth the effort - Discussion about NULL integer parameters.
Option Recompile
Dates
Dear ISV: You’re Keeping Me Awake Nights with Your VARCHAR() Dates
The ultimate guide to the datetime datatypes - very long and comprehensive article by Tibor
Karaszi.
Bad habits to kick : mis-handling date / range queries - from the Aaron Bertrand Series of Bad
Habits to Kick
Date Range WHERE Clause Simplification - article by Erik E.
Weekly data thread
T-SQL: Date Related Queries - Naomi's TechNet WiKi article.
How to get the first and last day of the Month, Quarter, Year
Calendar Table
Concurrency
Cursors
The Truth about Cursors - Part 1 - Series of blogs about cursors by Brad Schulz.
The Truth about Cursors - Part 2
The Truth about Cursors - Part 3
String Manipulations
Handy String Functions - several functions emulating VFP functions by Brad Schulz.
MSDN thread about RegEx in T-SQL
CLR RegEx - interesting series about CLR RegEx
Create Random String - 7 different options including CLR code.
String Split
XML
Concatenate Rows
CTE Performance
PIVOT
UNPIVOT
Running Total
ASP.NET
Getting the identity of the most recently added record - Mikesdotnetting blog.
How to insert information into multiple related tables and return ID using SQLDataSource
How to Avoid SQL Injection Attack - Long FAQ on ASP.NET forum.
SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!
Other Topics
Design Decisions
Blocking problems
NOT IN problem
JOIN problem
Why LEFT JOIN doesn't bring records from the LEFT table
Orphans check
UPDATE FROM
CTE
Hierarchical Table Sorting with a Parent-Child Relation
Problem
Given the following table Accounts
the requirement is to have a query that should sort the table perfectly based on the Child parent
hierarchy. Or more clearly each child must be directly under its parent. Like the below
Think of it as a depth-first search, where the children are sorted in the alphabetical order.
Go as far down the left-most branch as you can, then move one branch to the right. So the children of
John have to be listed before carrying on listing the children of Alex.
Solution
This uses a recursive cte to build the hierarchy, and each level, orders by name. If you leave the [path]
column in the final select, you will see how it has been built up, and this is used to order the final result
set.
declare @Accounts table (AccountID int, name varchar(50), ParentID int)
;with cte as
(
select
Accountid,
name,
parentid,
cast(row_number()over(partition by parentid order by name) as varchar(max))
as [path],
0 as level,
row_number()over(partition by parentid order by name) / power(10.0,0) as x
from @Accounts
where parentid = 0
union all
select
t.AccountID,
t.name,
t.ParentID,
[path] +'-
'+ cast(row_number()over(partition by t.parentid order by t.name) as varchar(max
)),
level+1,
x + row_number()over(partition by t.parentid order by t.name) /
power(10.0,level+1)
from
cte
join @Accounts t on cte.AccountID = t.ParentID
)
select
Accountid,
name,
ParentID,
[path],
x
from cte
order by x
this gives
Accountid name ParentID path x
--------- --------- -------- ------ --------------------
1 Alex 0 1 1.000000000000000000
8 George 1 1-1 1.100000000000000000
2 John 1 1-2 1.200000000000000000
3 Mathew 2 1-2-1 1.210000000000000000
6 Shine 2 1-2-2 1.220000000000000000
7 Tom 2 1-2-3 1.230000000000000000
4 Philip 1 1-3 1.300000000000000000
5 Shone 0 2 2.000000000000000000
9 Jim 5 2-1 2.100000000000000000
The [path] column explains the level the account is in the hierarchy, so for example 'Shine' is 122, which
reading right-to-left means the second child of the second child of the first child, or in other words, the
second child of the second child of Alex => the second child of John.
CHAPTER 3:
Apply Operator
APPLY Operator in SQL Server
Introduction
APPLY operator is a new feature in SQL Server 2005 and TOP has some new enhancements in SQL 2005.
We will discuss these two operators in this article.
APPLY Operators
APPLY operator is a new feature in SQL Server 2005 used in the FROM clause of a query. It allows you to
call a function-returning TABLE for each row of your outer TABLE. We can pass outer table's columns as
function arguments.
The below query returns all the records of the customer table matching with cust.CustomerID. To
execute the code below, you need to have two database tables listed below with some data in it.
--Use APPLY
SELECT * FROM Customer cust
CROSS APPLY
fnGetCustomerInfo(cust.CustomerID)
ORDER BY cust.CustName
TOP Operator
In SQL 2005, TOP is used to restrict the number of rows returned as a number or percentage in SELECT,
UPDATE, DELETE or INSERT statements. Earlier, this was possible only with SELECT query. This enhanced
feature replaces SET ROW COUNT which had performance issues.
Note: Expression should be of bigint for literal option and float for expression option.
This article originates from the following MSDN Transact SQL Forum's question: Complex logic to be
implemented in SQL - Please help! and I hope I made a pun with its title.
In my solution to the problem presented by the thread's originator I am going to show how to
use OUTER APPLY operator to solve common problems.
Problem Description
The first idea that comes to mind is that since we would need to expand ranges of dates we would need
a Calendar table with all the months. There are many common date related queries scenarios that
benefit from the permanent Calendar table in each database, as well as a Numbers table. You may want
to check this excellent article explaining why it is important to have such a Calendar table: Why should I
consider a Calendar table? For this particular problem we only need to have one row per each month,
so we can either generate such a table on the fly or select from our existing Calendar table. While
working on this article I discovered that the database I used to create the Enrollments table didn't have
a permanent Calendar table, so I used this quick script to generate it for the purpose of solving the
original problem:
INTO #Tally
FROM Master.dbo.SysColumns sc1
,Master.dbo.SysColumns sc2
So with that script we prepared the Calendar table with one row per each month from 01/01/1900 till
01/12/2019.
With that table in place I can now proceed with solving the problem we wanted to solve.
We need to create the start and end date for each enrollment and then join with the Calendar table to
expand ranges. The start date is obviously the enrollment date and the end date is either the date one
month prior to the next enrollment date for that Student or the first day of the current month.
Therefore I used the obvious idea here which I have used many times in the past for the similar kind of
the problems:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,COALESCE(DATEADD(month, - 1, N.Enroll_Date), DATEADD(month,
DATEDIFF(month, '19000101',CURRENT_TIMESTAMP), '19000101')) AS End_Date
,S.Class
FROM Enrollments S
OUTER APPLY (
SELECT TOP (1) Enroll_Date
FROM Enrollments E
WHERE E.StudentId = S.StudentId
AND E.Enroll_Date > S.Enroll_Date
ORDER BY Enroll_Date
) N)
SELECT *
FROM cte;
I've added SELECT * FROM cte so we can examine our intermediate result and verify that it is correct
logic.
Now we only need to add a JOIN to Calendar table to get the desired result with expanded ranges:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,COALESCE(DATEADD(month, - 1, N.Enroll_Date), DATEADD(month,
DATEDIFF(month, '19000101',CURRENT_TIMESTAMP), '19000101')) AS End_Date
,S.Class
FROM Enrollments S
OUTER APPLY (
SELECT TOP (1) Enroll_Date
FROM Enrollments E
WHERE E.StudentId = S.StudentId
AND E.Enroll_Date > S.Enroll_Date
ORDER BY Enroll_Date
) N)
SELECT S.StudentId, Cal.the_date AS Enroll_Date, S.Class
FROM cte S INNER JOIN dbo.Calendar
Cal ON Cal.the_date BETWEEN S.Start_Date AND S.End_Date;
SQL Server 2012 Solution
SQL Server 2012 and up offers a simpler alternative to the OUTER APPLY solution. In SQL Server 2012
the LEAD() and LAG() functions were introduced that allow us to avoid correlated subquery and
transform that solution into this code:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,DATEADD(month, -1,LEAD(S.Enroll_Date, 1, DATEADD(day, 1,
EOMONTH(CURRENT_TIMESTAMP))) OVER
(PARTITION BY S.StudentId ORDER BY S.Enroll_Date)) AS End_Date
,S.Class
FROM Enrollments S
In this solution I also used the new EOMONTH() function in order to advance one month from the
current month for the default value in the LEAD function. Then we're subtracting one month from that
expression as a whole.
Conclusion
In this article we learned how to apply simple T-SQL tricks to solve a problem. We learned 2 solutions -
one which only works in SQL Server 2012 and above and another solution that may be used in prior
versions of SQL Server.
Addendum
Today's Transact SQL Server MSDN Forum post "Dynamic Columns with some additional logic" is an
interesting continuation of this article theme and also my other T-SQL: Dynamic Pivot on Multiple
Columns article. In my reply to the thread's originator I hinted the possible solution using the ideas from
both articles. Please leave a comment to this article if you want that case to become a new article or
part of this article.
Fixing Missing Data Based on Prior Row Information
One of the commonly asked problems in the Transact-SQL forum is how to provide missing
information based on the information in the first prior row that has data (or alternatively in the next row
(by date)). One of the examples where this problem was discussed is this thread .
In this thread the original poster was kind enough to provide DDL and the DML (data sample), so it
was easy to define a solution based on the OUTER APPLY :
CREATE TABLE [dbo].[test_assign] (
[name] [varchar](25) NULL
,[datestart] [date] NULL
,[dateEnd] [date] NULL
,[assign_id] [int] IDENTITY(1, 1) NOT NULL
,CONSTRAINT [PK_test_assign] PRIMARY KEY CLUSTERED ([assign_id] ASC) WITH (
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY]
The idea of this solution is to use correlated OUTER APPLY subquery to get first measurement date that
is prior the Start date of the main table.
A similar problem is also described in this thread and the solution will also be a variation of CROSS
APPLY solution. So, you can see that this problem is very common.
CHAPTER 4:
Pivot
SQL Server PIVOT
Problem Definition
Recently in this thread I helped to solve a relatively simple problem. I will quote my solution and then
I will explain the main problem people often encounter with PIVOT solutions
;WITH CTE_STC_DETAIL_CODES AS
(
SELECT
[Code_V_2].[CODE_CAT],
[Code_V_2].[DESCRIPTION]
FROM [dbo].[STC_Detail]
INNER JOIN [STC_Header_V_2]
ON [STC_Header_V_2].[STCID] = [STC_Detail].[STCID]
INNER JOIN [STC_Code]
ON [STC_Code].[STCDTLID] = [STC_Detail].[STCDTLID]
INNER JOIN [Code_V_2]
ON [Code_V_2].[CodeID] = [STC_Code].[CodeID]
WHERE [STC_Header_V_2].[STC] = '33 '
)
SELECT [STCDTLID],
[SN] AS 'Sub Net',
[NT] AS 'Network Indicator',
[CV] AS 'Coverage Level',
[TQ] AS 'Time Period Qualifier',
[AI] AS 'Authorization Indicator',
[CS] AS 'Cost Share Type',
[IC] AS 'Insurance Certificate
Code',
[QQ] AS 'Quantity Qualifier Code'
FROM CTE_STC_DETAIL_CODES
PIVOT
(
MAX([DESCRIPTION])
FOR CODE_CAT IN
(
[SN],
[NT],
[CV],
[TQ],
[AI],
[CS],
[IC],
[QQ]
)) AS Pvt
Common Problem
The pivot solution by itself is not complex, it is a simple static PIVOT. But the thread originator was
having a problem arriving to it. The main problem is to understand, that all columns which are not
mentioned in the PIVOT aggregate function in the PIVOT clause will be aggregated, so if there is a
column with unique values in the source table for the pivot and it is not mentioned in the PIVOT clause,
it will be a source of the aggregation and therefore the result will have as many rows as you have unique
columns in the table defeating the main purpose of the PIVOT.
Other Blogs
There are two blog posts that may help understanding PIVOT better:
by George Mastros
Table 1:
Table 2:
ColumnName 1 2
DEPARTMENT A/C SALES
EMPID 1 2
ENAME TEST1 TEST2
SALARY 2000 3000
Below code block will transform resultset in Table1 format to Table2 format.
--Dynamic unpivoting
SELECT * INTO ##temp FROM (
SELECT
ROW_NUMBER()OVER(PARTITION BY ColumnName ORDER BY ColumnValue) rn,* FROM (
SELECT i.value('local-name(.)','varchar(100)') ColumnName,
i.value('.','varchar(100)') ColumnValue
FROM @xmldata.nodes('//*[text()]') x(i) ) tmp ) tmp1
--SELECT * FROM ##temp
--Dynamic pivoting
DECLARE @Columns NVARCHAR(MAX),@query NVARCHAR(MAX)
SELECT @Columns = STUFF(
(SELECT ', ' +QUOTENAME(CONVERT(VARCHAR,rn)) FROM
(SELECT DISTINCT rn FROM ##temp ) AS T FOR XML PATH('')),1,2,'')
SET @query = N'
SELECT ColumnName,' + @Columns + '
FROM
(
SELECT * FROM ##temp
) i
PIVOT
(
MAX(ColumnValue) FOR rn IN ('
+ @Columns
+ ')
) j ;';
EXEC (@query)
--PRINT @query
DROP TABLE ##temp
T-SQL: Dynamic Pivot on Multiple Columns
The problem of transposing rows into columns is one of the most common problems discussed in MSDN
Transact-SQL forum . Many times the problem of creating a dynamic pivot comes into the light. One
thing that many people who ask this question forget is that such transposing is much easier to perform
on the client side than on the server where we need to resort to dynamic query. However, if we want to
make such pivot dynamically, the important thing to understand is that writing dynamic query is only
slightly more difficult than writing static query. In fact, when I am presented with the problem of
dynamic pivot, I first figure out how static query should look like. Then making such query dynamically
becomes rather trivial task.
I had written on the topic of dynamic pivot on multiple columns before in this blog post: Dynamic PIVOT
on multiple columns .
I don't want to re-tell what I already told in that blog so this article will show another example from the
most recent thread on the topic of dynamic pivot.
In that thread I presented the following solution to the problem of dynamic pivot for unknown number
of columns
USE tempdb
SET @i = 0;
SET @SQL = '';
PRINT @SQL;
EXECUTE (@SQL);
In this solution the first step was figuring out the static solution using ROW_NUMBER() with partition
approach. This is CASE based pivot although we could have used the true PIVOT syntax here instead.
CASE based pivot is easier to use if we need to transpose multiple columns. Once we knew the static
pivot, we were able to easily turn it into dynamic using WHILE loop.
Just for completion, I also show the same problem solved using PIVOT syntax:
SET @i = 0;
PRINT @SQL;
EXECUTE (@SQL);
As you see, the code is very similar to the first solution, but using PIVOT syntax instead of CASE based
pivot.
I hope to add more samples to this article as new opportunities present themselves.
There was another recent question about dynamic PIVOT where this article solution was right on target.
This entry participated in the Technology Guru TechNet WiKi for May contest and won the Gold prize.
Additional Resources
Dynamic PIVOT on multiple columns
T-SQL: Create Report for Last 10 Years of Data
Recently in the MSDN Transact-SQL forum thread Please help with dynamic pivot query/ CTE/
SSRS I provided a solution for a very common scenario of generating report for last N (10 in that
particular case) years (months, days, hours, etc.) of data.
Problem Definition
In the course of the thread the topic starter has provided the following definitions of the tables:
GO
GO
GO
Solution
The idea here is to use the dynamic PIVOT. To generate last 10 years of data I am going to use a loop.
The reason I am using a simple direct loop instead of more commonly used scenario of querying the
table and generating column list using XML path solution, is that
1) In theory we may have missing data in our table (this is more theoretical situation with years, but not
uncommon with months or days)
2) Direct loop allows us to be more flexible and add more columns, if needed. Say, it is easy to adjust
solution to not only show year column, but also a percent difference between this year and prior year,
for example.
So, you can see we used dynamic PIVOT to generate desired output and then sp_ExecuteSQL system
stored procedure to run our query with 2 date parameters.
Conclusion
I showed how easily we can generate a report for last N years (months, days, hours) of data and how
easily we can add more columns to the output using direct loop solution.
CHAPTER 5:
Relational Devision
T-SQL: Relational Division
In this article I am going to discuss one of the problems of relational algebra which was recently
brought up in this Transact-SQL MSDN Forum T-sql - finding all sales orders that have similar
products.
Introduction
There are certain kind of problems in relational database which may be solved using principals from
Relational Division. There are many articles in the Internet about Relational Division or Relational
Algebra. I list just a few very interesting articles Divided We Stand: The SQL of Relational
Division by Celko and Relational division by Peter Larsson and suggest readers to take a look
at them and other articles on this topic. Peter also pointed me out to this new and very interesting
article Relationally Divided over EAV which I am going to study in next couple of days.
Problem Definition
In the aforementioned thread the topic starter first wanted to find all orders that have similar products.
He provided the table definition along with few rows of data.
Rather than using data from that thread I want to consider the same problem but
using AdventureWorks database instead. So, I'll first show a solution for the problem of finding orders
that have the same products.
Solutions
This problem has several solutions. First two are the true relational division solutions and the last
solution is non-portable T-SQL only solution based on the de-normalization of the table. The first
solution in that script was suggested by Peter Larsson after I asked him to check this article. I'll post the
script I ran to compare all three solutions:
USE AdventureWorks2012;
SET NOCOUNT ON;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'PESO Solution';
WITH cte
AS (
SELECT SalesOrderID
,STUFF((
SELECT ', ' + CAST(ProductID AS VARCHAR(30))
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = SD.SalesOrderID
ORDER BY ProductID
FOR XML PATH('')
), 1, 2, '') AS Products
FROM Sales.SalesOrderDetail SD
GROUP BY SD.SalesOrderID
)
SELECT cte.SalesOrderID AS OrderID
,cte1.SalesOrderID AS SimilarOrderID
,cte.Products
FROM cte
INNER JOIN cte AS cte1 ON cte.SalesOrderID < cte1.SalesOrderID
AND cte.Products = cte1.Products;
First solution JOINS with the number of items in each order and MIN/MAX product in each order. This
solution is based on the idea Peter proposed in this closed MS Connect Item Move T-SQL language closer
to completion with a DIVIDE BY operator .
The second solution self-joins the table based on the ProductID using an extra condition of O1.OrderID <
O2.Order2 (we're using < instead of <> in order to avoid opposite combinations), then groups by both
OrderID columns and uses HAVING clause to make sure the number of products is the same as the
number of products in each individual order. This HAVING idea is very typical for the Relational Division
problem.
Interestingly, the number of combinations in AdventureWorks database is 1,062, 238 (more than rows in
the SalesOrderDetail table itself). This is due to the fact that many orders consist of only single product.
The last solution is rather straightforward and uses XML PATH approach to get all products in one row
for each order ID, then self-joins based on this new Products column. This solution is not portable into
other relational database languages but specific for T-SQL only. Interestingly, it performs better than
second 'true' Relational Division solution as you can see in this picture.
As you can see, the first query takes 0%, second 60% while the last takes 40% of the execution time.
The last solution, however, is also not very flexible and is only suitable for finding exact matches.
These are results I got on SQL Server 2012 SP1 64 bit (they are much better on SQL Server 2014 CTP
according to Peter):
PESO Solution
Table 'SalesOrderDetail'. Scan count 3410626, logical reads 7265595, physical reads 0, read-ahead reads
0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 855922, logical reads 3462746, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 36, logical reads 3292, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 266, logical reads 907592, physical reads 0, read-ahead reads 0, lob logical
reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 12971, physical reads 0, read-ahead reads 0, lob logical
reads 8764, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 62932, logical reads 194266, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Peter sent yet another variation of the solution for the integer Product ID (this solution will not work if
the product ID /Item ID uses character or GUID key).
This solution joins 2 aggregate information together based on CHECKSUM_AGG function. By checking
all these aggregate functions it is enough to conclude if the orders consist of the same products or not.
This is the simplest and ingenious query and it performs the best among the other variations I tried. The
limitation of this query is that it assumes integer key for the product id.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 2, logical reads 2492, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
In that thread the topic starter also wanted to compare orders based on partial similarity. You may
recognize this problem as 'Customers who bought this item also bought...' as you often can see in
different websites.
Say, we want to find orders, that have 2/3 or more of the products matching. We will only consider
orders with more than 2 items (3 and up) for this problem. The first solution can be easily adjusted for
this new problem:
WITH cte
AS (
SELECT SalesOrderID
,ProductID
,COUNT(ProductID) OVER (PARTITION BY SalesOrderID) AS ProductsCount
FROM Sales.SalesOrderDetail
)
SELECT O1.SalesOrderId AS OrderID
,O2.SalesOrderID AS SimilarOrderID
FROM cte O1
INNER JOIN cte O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
WHERE O1.ProductsCount > = 3
AND O2.ProductsCount >= 3
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
HAVING COUNT(O1.ProductID) >= (
(
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = O1.SalesOrderID
) * 2.0
) / 3.0
AND COUNT(O2.ProductID) >= (
(
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD2
WHERE SD2.SalesOrderID = O2.SalesOrderID
) * 2.0
) / 3.0
ORDER BY OrderID
,SimilarOrderID;
We can verify our results back for the few first rows:
SELECT SalesOrderID
,stuff((
SELECT ', ' + cast(ProductID AS VARCHAR(30))
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = SD.SalesOrderID
ORDER BY ProductID
FOR XML PATH('')
), 1, 2, '') AS Products
FROM Sales.SalesOrderDetail SD
WHERE SalesOrderID IN (
43659
,43913,
43659, 44528,
43659, 44566,
43659, 44761,
43659, 46077)
GROUP BY SalesOrderID
ORDER BY SalesOrderID
I will show two variations of the solutions for the similar orders problem. While I am getting better reads
for the second query, the execution time is much better for the first query:
WITH cte
AS (
SELECT SalesOrderID
,ProductID
,COUNT(ProductID) OVER (PARTITION BY SalesOrderID) AS ProductsCount
FROM Sales.SalesOrderDetail
)
SELECT O1.SalesOrderId AS OrderID
,O2.SalesOrderID AS SimilarOrderID
FROM cte O1
INNER JOIN cte O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
WHERE O1.ProductsCount > = 3
AND O2.ProductsCount >= 3
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
HAVING COUNT(O1.ProductID) >= (
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = O1.SalesOrderID
) * @Percentage
AND COUNT(O2.ProductID) >= (
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD2
WHERE SD2.SalesOrderID = O2.SalesOrderID
) * @Percentage
ORDER BY OrderID
,SimilarOrderID;
WITH cte
AS (
SELECT SalesOrderID
,COUNT(ProductID) AS Items
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
)
SELECT O1.SalesOrderId AS OrderID
,MIN(C1.Items) AS [Products 1]
,O2.SalesOrderID AS SimilarOrderID
,MIN(C2.Items) AS [Products 2]
FROM Sales.SalesOrderDetail O1
INNER JOIN cte C1 ON O1.SalesOrderID = C1.SalesOrderID
INNER JOIN Sales.SalesOrderDetail O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
INNER JOIN cte C2 ON O2.SalesOrderID = C2.SalesOrderID
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
Conclusion
In this article I showed that some common relational division problems can be solved using set-based
solutions. These solutions may not perform well, however, on the big datasets. I encourage the readers
to provide their ideas for the listed problems and their Pros/Cons.
CHAPTER 6:
EOMONTH
We had a problem whenever we wanted to identify the end date of a month. There was no built in
function. But now that problem is solved in SQL Server 2012. The function EOMONTH returns the date
of the month.
Output: 2012-02-29
You can specify a number of months in the past or future with the EOMONTH function.
Output: 2012-01-31
CHOOSE
Using this to select a specific item from a list of values.
Output: AGM
CONCAT
This function is concatenating two or more strings
LEAD
Using the function you can access data from a subsequent row in the same result set without the use of
a self-join.
SELECT EntityID, YEAR(QuotaDate) AS SalesYear, SalesQuota AS CurrentQuota,
LEAD(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate)) AS PreviousQuota
FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');
OutPut
EntityID SalesYear CurrentQuota PreviousQuota
---------------- ----------- --------------------- ---------------------
275 2005 367000.00 556000.00
275 2005 556000.00 502000.00
275 2006 502000.00 550000.00
275 2006 550000.00 1429000.00
275 2006 1429000.00 1324000.00
275 2006 1324000.00 0.00
In the previous version (SQL Server 2008), a popular albeit obscure way to get the end of the month:
-- 2013-06-30
SELECT EOMONTH(current_timestamp);
-- 2013-06-30
We can add an optional parameter to get the end date for other months:
Using a dynamic parameter, we can get the last day of previous year:
Applying the DATEADD function we can obtain the first day of current year:
-- 163
/*
....
....
*/
How SQL Server Determines Type of the Constant
Problem Definition
There was an interesting question asked recently in Transact-SQL forum "Basic doubt in Round
function" .
Explanation
So, what is happening here? Why we're getting this error? The explanation lies in the way SQL Server
determines the type of the constant. In this particular case it figures that it can use precision 4 and scale
1 (1 figure after decimal point). So, that precision will not be enough to hold the value 1000 and thus
we're getting the error.
We can verify the type, precision and scale using the following query:
SELECT
SQL_VARIANT_PROPERTY(744.0, 'BaseType') as BaseType,
SQL_VARIANT_PROPERTY(744.0, 'Precision') as Precision,
SQL_VARIANT_PROPERTY(744.0, 'Scale') as Scale,
SQL_VARIANT_PROPERTY(744.0, 'MaxLength') as MaxLength
which returns:
This page in BOL shows what types the constants can be. It does not explain the rules how SQL Server
figures it out.
All constants have datatypes. Integer constants are given datatype int, decimal values are given datatype
numeric(p,q) where p is the number of digits (not counting leading zeros) in the number, and q is the
number of digits to the right of the decimal point (including trailing zeroes).
Conclusion
As shown in this article it is better to explicitly CAST to the desired type rather than rely on SQL Server
making the decision.
Understanding NOLOCK Query Hint
In our day to day T-SQL querying we use lot of query hints to modify the way a particular query will be
executed.
When we specify query hint SQL Server produces optimized plan using this query hint. This can be
dangerous if it is not tested before in UAT as it is known fact that query plan which SQL Server makes using
optimizer, which is its prized possession, is the best.
The algorithm which is written for optimizer at low level is not known to the ordinary people, how it makes
best/optimized, most cost effective plan is not known to outside world but we know it does.
Query hints specify that the indicated hints should be used throughout the query and they affect all
operators in the statement. One such query hint is NOLOCK. As the name suggests and many users feel,
that when this hint is specified in the query, the operation does not take lock. This is not the case!
I will demonstrate it using simple query. I create a simple table with "e_id" as PK col, "name", "address"
and "cell no.
BEGIN TRAN
EXEC sp_lock
If you see below this transaction has SPID 55 which is ID for the code which is just executed. It has
taken two locks IS,S
In Mode Column
S =Shared lock
IS=Intent Shared
In Type Column
DB = Database
TAB= Table
Now let us run same query with NOLOCK query hint and see if it actually takes any lock.
BEGIN TRAN
EXEC sp_lock
As you can see same lock is taken on the same table (see Objid in both fig they are same 1131151075) .
IS and S.
So point is what is difference between query execution one which is given with NOLOCK and one which
is not given with any nolock query hint.
Difference comes when both are trying to select data from table which has taken exclusive lock, I mean
to say difference comes when query is trying to access table which is locked by INSERT/UPDATE
statement.
I will show this with query > let us run an update command on the same table for the same row.
BEGIN TRAN
EXEC sp_lock
Now I run the same queries Query1 and Query2
Query 2 - Now other query which is not using any query hint
Now we see the difference: query with NOLOCK query hint produced output but simple query with no
hint is not producing any output. It is blocked and that can be seen by running sp_who2, I ran this query
and result is below:
As you can see SPID 56 is blocking SPID 55. Then I ran DBCC INPUTBUFFER command to find out text
corresponding to these SPID's, below is the result:
From the above query output it is clear that when we use NOLOCK query hint, transaction can read data
from table which is locked by Update/insert/delete statement by taking the exclusive lock (exclusive lock
is not compatible with any other lock). But if in same transaction we don't use query hint (NOLOCK) it
will be blocked by update statement.
Drawback of NOLOCK is dirty read. So it is not advised to use it in production environment. But it can be
used to read data from a table partition which won't be updated when this select is running. Like you
can run query to select data from Table partition containing Jan 2013 data summing no records will be
updated for January.
SET ANSI_PADDING Setting and Its Importance
Problem Description
Recently I got an interesting escalation to solve for the client. Our VFP based application was getting the
following SQL Server error: "Violation of PRIMARY KEY constraint 'rep_crit_operator_report'.
Cannot insert duplicate key in object 'dbo.rep_crit' The duplicate key value is (ADMIN,
REPORT_PERIOD_SALES)."
Investigation
I started my investigation of the problem by checking VFP code and finding it to be a bit sloppy with no
good error handling (the code was issuing a TABLEUPDATE without checking its return status).
I then connected to the client through TeamViewer and observed that error in action. I then also fired SQL
Server Profiler and found that the tableupdate command was attempting to do an insert instead of
UPDATE and therefore was failing with the above error. At that point I was afraid that we would not be
able to solve the problem without fixing the source code.
In the VFP source code we were always padding the report column which was defined as varchar(20) to
20 characters. I am not sure why we were doing it this way and why in this case we were not using
CHAR(20) instead of VARCHAR(20) since the value was always saved with extra spaces at the end. But
since this code was there for a long time, I didn't try to question its validity.
At that point I decided to test what was the actual length of report column saved in the table. So, I ran the
following query
To my surprise I saw values less than 20. I ran the same code in my local database and got expected
value 20 for all rows. The strange behavior on the client was a bit perplexing.
I then thought I'll try to fix the problem and ran the following UPDATE statement:
to pad the column with spaces at the end. Again, I verified that code locally first. I ran that code on the
client and then ran the first select statement and got the same result as before - the column still showed
length less than 20 characters.
Resolution
To be honest, I should have guessed what was happening by myself. But I must admit that I still didn't, I
sent e-mail to my colleagues asking what do they think about that strange behavior and I also posted this
thread Weird problem with the client . My colleague immediately recognized the problem as one he
already experienced with another client. And Latheesh NK also pointed out into SET
ANSI_PADDING setting as possible culprit.
So, somehow several tables were saved with the wrong ANSI_PADDING setting being in effect and
therefore the column's setting overrode sessions’ settings.
Recently I made a change in our VFP applications to save varchar columns as varchar (prior to that all
varchar columns were automatically padded with spaces to their length). This caused the above
mentioned problem when the client upgraded the software to the recent release version.
The solution to that particular error was to run ALTER TABLE statement to alter report column to be the
same width as the original column but using SET ANSI_PADDING ON before running the statement. This
fixed the wrong padding on the column.
This is how we can check column's status in design mode when we right click on the column and check its
properties:
After the problem was identified, we wanted to check the scope of the problem and also correct the
problem for other columns that have been saved with wrong ANSI_PADDING setting.
Script to correct problem in the whole database
I came up with the following script to correct the problem:
;WITH cte
AS (
SELECT c.is_nullable
,c.object_id AS table_id
,OBJECT_NAME(c.object_id) AS TableName
,c.max_length
,c.NAME column_name
,CASE c.is_ansi_padded
WHEN 1
THEN 'On'
ELSE 'Off'
END AS [ANSI_PADDING]
,T.NAME AS ColType
FROM sys.columns c
INNER JOIN sys.types T ON c.system_type_id = T.system_type_id
WHERE T.NAME IN ('varbinary', 'varchar')
)
SELECT 'ALTER TABLE dbo.' + quotename(cte.TableName) + ' ALTER COLUMN ' +
QUOTENAME(cte.column_name) + ' ' + cte.ColType + '(' + CASE
WHEN cte.max_length = - 1
THEN 'max'
ELSE CAST(cte.max_length AS VARCHAR(30))
END + ')' + CASE
WHEN cte.is_nullable = 1
THEN ' NULL '
ELSE ' NOT NULL'
END
FROM cte
INNER JOIN (
SELECT objname
FROM fn_listextendedproperty('SIRIUS_DefaultTable', 'user', 'dbo', 'table',
NULL, NULL, NULL)
) st ON st.objname = cte.TableName
AND cte.ANSI_PADDING = 'Off'
In this code the extra INNER JOIN is done to perform the update only on our tables in the database. In
generic case you don't need this extra JOIN.
We need to run the code above using Query results to Text option from the Query menu. Then we can
copy the output of that statement into new query window and run it to fix this problem.
Default Database Settings
I discussed this problem in one more thread SET ANSI_PADDING setting . This thread provides
additional insight into the importance of the correct setting.
It would be logical to expect that when we create a new database, the default settings have correct
values for SET ANSI_NULL and SET ANSI_PADDING. However, this is not the case even for SQL Server
2012. If we don't change database defaults, they all come up wrong. See them here:
Therefore if we want correct settings on the database level, it may be a good idea to fix them at the
moment we create a new database. However, these settings are not very important since they are
overwritten by the session settings.
As noted in the Comments, another interesting case of varbinary truncation due to this wrong setting is
found in this Transact-SQL forum 's thread.
All-at-Once Operations in T-SQL
I remember when I read about this concept in a book from Itzik Ben-Gan in 2006, I was so excited and
could not sleep until daylight. When I encountered the question about this concept in MSDN Forum, I
answered it with the same passion that I read about this mysterious concept. So I made a decision to
write an article about it. I want to ask you to be patient and do not see the link of the question until end
up reading this article. Please wait, even you know this concept completely, because I hope this will be
an amazing trip.
Introduction
Each SQL query statement is made up by some clauses and each clause help us to achieve the expected
result. Simply, in one SELECT query we have some of these clauses:
SELECT
FROM
WHERE
GROUP BY
HAVING
Each of which performs one logical query processing phase. T-SQL is based on Sets and logic. When we
run a query against a table, in fact, the expected result is a Sub-Set of that table. With each phase we
create a smaller Sub-Set until we get our expected result. In each phase we perform a process over whole
Sub-Set elements. The next figure illustrates this:
Definition
All-at-Once
"All-at-Once Operations" means that all expressions in the same logical query process phase are
evaluated logically at the same time.
-- query
SELECT
LTRIM( RTRIM( FirstName ) ) + ' ' AS [Corrected FirstName],
FROM @Test ;
As illustrated with this figure, after executing we encounter this error message:
Invalid column name 'Corrected FirstName'.
This error message means that we cannot use an alias in next column expression in the SELECT clause. In
the query we create a corrected first name and we want to use it in next column to produce the full
name, but the All-at-Once operations concept tells us you cannot do this because all expressions in the
same logical query process phase (here is SELECT) are evaluated logically at the same time.
Because T-SQL is a query language over Relational Database System (Microsoft SQL SERVER), it deals
with Sets instead of variables. Therefore, query must be operated on a Set of elements. Now I want to
show another example to illustrate this.
-- drop test table
IF OBJECT_ID( 'dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
CONSTRAINT FK_Self_Ref
FOREIGN KEY ( ParentId )
REFERENCES dbo.Test ( Id )
);
GO
-- insert query
INSERT dbo.Test
( Id, ParentId )
VALUES ( 1, 2 ), -- there is not any id = 2 in table
( 2, 2 ) ;
-- update query
UPDATE dbo.Test
SET Id = 7,
ParentId = 7 -- there is not any id = 7 in table
WHERE Id = 1 ;
After execute this code, as it shown in following figure, we see that whether there is no ( id=2 ) in the
table, but we can insert it as a foreign key in the table. This is because of All-at-Once operations.
As illustrated in next figure this behavior is repeated in UPDATE query. If we do not have All-at-Once
operations feature we should first insert or update the primary key of the table, then modify the foreign
key.
Many programmers who are experts in non SQL language, like C# and VB, confuse with this behavior at
first, because they fall into the habit that processing a variable in first line of code and using the processed
variable in the next line. They expected to do something like that in the T-SQL. But as I noted earlier, T-
SQL is a query language over Relational Database System (Microsoft SQL SERVER), and it deals with Sets
instead of variables. Therefore, the query must be operated on a Set of elements at the same time.
Moreover, in each logical query process phase, all expressions processed logically at the same point of
time.
Pros and Cons
This concept impacts on every situation in T-SQL querying. Some days it makes things hard to do and
sometimes it makes a fantastic process that we do not expect. To illustrate these impacts I explain four
real situations with their examples.
Silent Death
One of the problems that lack of attention to All-at-Once operations concept might produce is writing a
code that might encounter the unexpected error.
We know that square root of a negative number is undefined. So in the code below we put two conditions
inside where clause; first condition checks that Id1 is greater than zero. This query might encounter an
error, because the All-at-Once operations concept tells us that these two conditions are evaluated
logically at the same point of time. If the first expression evaluates to FALSE, SQL Server will Short Circuit
and whole WHERE clause condition evaluates to FALSE. Therefore, SQL Server can evaluate conditions in
WHERE clause in arbitrary order, based on the estimated execution plan.
-- query
SELECT *
FROM dbo.Test
WHERE
id1 > 0
AND
SQRT(Id1) = 1
If after executing the above code you do not receive any error, we need to perform some changes on our
code to force SQL Server to choose another order when evaluating conditions in the WHERE clause.
SET @Result = ( SELECT TOP (1) Id2 FROM dbo.Test WHERE Id1 < 1 );
RETURN @Result;
END
GO
-- query
SELECT *
FROM dbo.Test
WHERE
id1 > dbo.fnZero()
AND
SQRT(Id1) = 1
-- query
SELECT *
FROM dbo.Test
WHERE
CASE
WHEN Id1 < dbo.fnZero() THEN 0
WHEN SQRT(Id1) = 1 THEN 1
ELSE 0
END = 1;
CAUTION
After publishing this article, Naomi Nosonovsky noted me that "even CASE does not provide
deterministic order of evaluation with short circuiting".
Now we see another example. Although we add a condition in HAVING clause to check if Id2 is opposite
to zero, because of the All-at-Once operations concept, there is a probability to encounter an error.
-- query
SELECT Id2, SUM(Id1)
FROM dbo.Test
GROUP BY Id2
HAVING
id2 <> ( SELECT Id2 FROM dbo.Test WHERE Id1 < 1 ) /* this
subquery returns zero*/
AND
SUM(Id1) / Id2 = 3 ;
Therefore, the lack of attention to All-at-Once operations concept in T-SQL might result in encountering
the unexpected errors!
Code complexity
Moreover, this concept leads to complexity in debugging T-SQL code. Suppose we have a table “Person”.
This table has two columns “FirstName” and “LastName”. For some reasons the values within these
columns are mixed with extra characters. The problem is to write a query that retrieve a new column as
Full Name. This code produces our test data:
As illustrated in this figure the problem with column “FirstName” is that it’s mixed with extra numbers
that should be removed. And the problem with column “LastName” is that it’s mixed with extra space
characters before and after the real Last Name. Here is the code to do this:
SELECT PersonId ,
LEFT( LTRIM( RTRIM( FirstName ) ) , CHARINDEX( N' ' , LTRIM( RTRIM(
FirstName ) ) ) - 1 ) + N' ' + LTRIM( RTRIM( LastName ) ) AS [FullName]
FROM dbo.Person ;
Because of All-at-Once operations we cannot use an alias in next column expression in the SELECT clause.
So the code can be very complex to debug.
I found that one way to avoid this problem is using right Code Style and extra comments. The next code
is a well formed code style of the former code with same output result and it's easy to debug.
SELECT PersonId ,
/*
Prototype:
[FullName] ::: LEFT( [FirstName Trim],
[Index of first space character in FirstName Trim] - 1 ) + ' '+ [Corrected
LastName]
elements:
[FirstName Trim] ::: LTRIM( RTRIM( FirstName ) )
[Index of first space character in FirstName Trim] ::: CHARINDEX( N' ' ,
[FirstName Trim] )
[Corrected LastName] ::: LTRIM( RTRIM( LastName ) )
*/
LEFT( LTRIM( RTRIM( FirstName ) ) --
[FirstName Trim]
, CHARINDEX( N' ' , LTRIM( RTRIM( FirstName ) ) ) - 1 --[Index of
first space character in FirstName Trim]
)
+ N' '
+ LTRIM( RTRIM( LastName ) --
[Corrected LastName]
) AS [FullName]
FROM dbo.Person ;
Other solutions are "creating modular views" or "using Derived Table or CTE". I showed "creating
modular view" approach in this Forum thread .
This concept explains why we cannot use Window Functions in WHERE clause. We use ad
absurdum argument like those we use in mathematics. Suppose that we can use Window Functions in
WHERE clause. Please see the following code.
SELECT Id
FROM dbo.Test
WHERE
Id = 1002
AND
ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time.
Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated
execution plan. So the main question here is which condition evaluates first.
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause. You can think more about
this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!
Magic Update
This is the most exciting part of this article that I love it. The question is that how to swap values of two
columns in a table without using a temporary table? This code provide sample data for us:
GO
Consider that in all other non SQL languages, we have to use a temporary variable to swap values
between two variables. If we want to see the problem from the non SQL programmer, we should do
something like this prototype:
update Person
set @swap=Firsname
set Firstname=Lastname
set lastname=@swap
If we see the problem from a SQL programmer we can translate the above prototype by using a
temporary table “#swap”. The code should be like this:
SELECT PersonId,
FirstName ,
LastName
INTO #swap
FROM dbo.Person ;
UPDATE dbo.Person
SET FirstName = a.LastName ,
LastName = a.FirstName
FROM #swap a
INNER JOIN dbo.Person b ON a.PersonId = b.PersonId
This code works fine. But the main question is that how much time above script needs to run, if we have
millions of records?
If we are known with All-at-Once operations concept in T-SQL, we can do this job through one update
statement with the following simple code:
UPDATE dbo.Person
SET FirstName = LastName ,
LastName = FirstName ;
Exception
In definition section I noted that the query must be operated on a Set of elements. What will happen if a
query deal with multiple tables? In such queries we use table operators like JOIN and APPLY inside
FROM clause. By the way, these operators are logically evaluated from left to right. Because we have
multiple Sets, first we need to transform them to a Set then we have All-at-Once operations concept.
Therefore, this concept is not applicable to the table operators in FROM clause.
Conclusion
All-at-Once operations is one of the most important concept in T-SQL that has extreme impact on our T-
SQL programming, code style and performance tuning solutions.
SQL Server Columnstore Index FAQ
The SQL Server in-memory columnstore index (formerly called xVelocity) stores data by columns instead
of by rows, similar to a column-oriented DBMS. The columnstore index speeds up data warehouse query
processing in SQL Server 2012 and SQL Server 2014, in many cases by a factor of 10 to 100. We'll be
posting answers to frequently asked questions here.
SQL Server 2012 introduced nonclustered columnstore indexes. For more information, see the 2012
version of Columnstore Indexes on MSDN.
SQL Server 2014 has both clustered and nonclustered columnstore indexes, and both of these indexes
are updateable. For more information, see the 2014 pre-release version of Create Columnstore Index
(Transact-SQL) and Columnstore Indexes .
For both SQL Server 2012 and SQL Server 2014, see the wiki article SQL Server Columnstore
Performance Tuning on Technet.
Contents
1. Overview
2. Creating a Columnstore Index
3. Limitations on Creating a Columnstore Index
4. More Details on Columnstore Technology
5. Using Columnstore Indexes
6. Managing Columnstore Indexes
7. Batch Mode Processing
1. Overview
Microsoft SQL Server has a family of in-memory technologies. These are all next-generation technologies
built for extreme speed on modern hardware systems with large memories and many cores. The in-
memory technologies include in-memory analytics engine (used in PowerPivot and Analysis Services),
and the in-memory columnstore index (used in the SQL Server database).
SQL Server 2012, SQL Server 2014, and SQL Server PDW all use in-memory technologies to accelerate
common data warehouse queries. SQL Server 2012 introduced two new features: a nonclustered
columnstore index and a vector-based query execution capability that processes data in units called
"batches." Now, SQL Server 2014 is adds updateable clustered columnstore indexes.
What is a columnstore?
A columnstore is data that is logically organized as a table with rows and columns, and physically stored
in a columnar data format. Relational database management systems traditionally store data in row-
wise fashion. The values comprising one row are stored contiguously on a page. We sometimes refer to
data stored in row-wise fashion as a rowstore.
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data
format, called a columnstore. The data is compressed, stored, and managed as a collection of partial
columns, called column segments. You can use a columnstore index to answer a query just like data in
any other type of index.
A columnstore index appears as an index on a table when examining catalog views or the Object
Explorer in Management Studio. The query optimizer considers the columnstore index as a data source
for accessing data just like it considers other indexes when creating a query plan.
For nonclustered columnstore indexes, all you have to do is create a nonclustered columnstore index on
one or more tables in your database. The query optimizer will decide when to use the columnstore
index and when to use other types of indexes. The query optimizer will also choose when to use the new
batch execution mode and when to use row execution mode.
For clustered columnstore indexes, you need to first create a table as a heap or clustered index, and
then use the CREATE CLUSTERED COLUMNSTORE INDEX statement to convert the existing table to a
clustered columnstore index. If your existing table has indexes, you need to drop all indexes, except for
the clustered index, before creating a clustered columnstore index. Since the clustered columnstore
index is the data storage mechanism for the entire table, the clustered columnstore index is the only
index allowed on the table.
Nonclustered columnstore indexes are available in SQL Server 2012. Clustered columnstore indexes are
in the preview releases of SQL Server 2014 and will ship in the final release.
You can create a nonclustered columnstore index by using a slight variation on existing syntax for
creating indexes. To create an index named mycolumnstoreindex on a table named mytable with three
columns, named col1, col2, and col3, you would use the following syntax:
CREATE NONCLUSTERED COLUMNSTORE INDEX mycolumnstoreindex ON mytable (col1, col2, col3);
To avoid typing the names of all the columns in the table, you can use the Object Explorer in
Management Studio to create the index as follows:
1. Expand the tree structure for the table and then right click on the Indexes icon.
2. Select New Index and then Nonclustered columnstore index
3. Click Add in the wizard and it will give you a list of columns with check boxes.
4. You can either choose columns individually or click the box next to Name at the top, which will
put checks next to all the columns. Click OK.
5. Click OK.
When you create a clustered columnstore index, there is no need to specify columns since all columns in
the table are included in the index. This example converts a clustered index called myindex into a
clustered columnstore index.
Does it matter what order I use when listing the columns in the CREATE INDEX statement?
No. When the columnstore index is created, it uses a proprietary algorithm to organize and compress
the data.
Typically, you will put all the columns in a table in the columnstore index, although it is not necessary to
include all the columns. The limit on the number of columns is the same as for other indexes (1024
columns). If you have a column that has a data type that is not supported for columnstore indexes, you
must omit that column from the columnstore index.
A columnstore index can include columns with the following data types: int, big int, small int, tiny int,
money, smallmoney, bit, float, real, char(n), varchar(n), nchar(n), nvarchar(n), date, datetime,
datetime2, small datetime, time, datetimeoffset with precision <=2, decimal or numeric with precision
<= 18.
What data types cannot be used in a columnstore index?
The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18,
datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max),
cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.
How long does it take to create a columnstore index? Is creating a columnstore index a parallel
operation?
Creating a columnstore index is a parallel operation, subject to the limitations on the number of CPUs
available and any restrictions set on MaxDOP. Creating a columnstore index takes on the order of 1.5
times as long as building a B-tree on the same columns.
My MAXDOP is greater than one but the columnstore index was created with DOP = 1. Why it
was not created using parallelism?
If your table has less than one million rows, SQL Server will use only one thread to create the
columnstore index. Creating the index in parallel requires more memory than creating the index
serially. If your table has more than one million rows, but SQL Server cannot get a large enough memory
grant to create the index using MAXDOP, SQL Server will automatically decrease DOP as needed to fit
into the available memory grant. In some cases, DOP must be decreased to one in order to build the
index under constrained memory.
The memory required for creating a columnstore index depends on the number of columns, the number
of string columns, the degree of parallelism (DOP), and the characteristics of the data. SQL Server will
request a memory grant before trying to create the index. If not enough memory is available to create
the index in parallel with the current max DOP, SQL Server will reduce the DOP as needed to get an
adequate memory grant. If SQL Server cannot get a memory grant to build the index with DOP = 1, the
index creation will fail.
A rule of thumb for estimating the memory grant that will be requested for creating a columnstore index
is:
Memory grant request in MB = [(4.2 *Number of columns in the CS index) + 68]*DOP + (Number of
string cols * 34)
What can I do if I do not have enough memory to build the columnstore index?
It's possible for creation of a columnstore index to fail either at the very beginning of execution if it can't
get the necessary initial memory grant, or later during execution if supplemental grants can't be
obtained. If the initial grant fails, you'll see error 8657 or 8658. You may get error 701 or 802 if memory
runs out later during execution. If out-of-memory error 8657 or 8658 occur at the beginning of
columnstore index creation, first, check your resource governor settings. The default setting for resource
governor limits a query in the default pool to 25% of available memory even if the server is otherwise
inactive. This is true even if you have not enabled resource governor. Consider changing the resource
governor settings to allow the create index statement to access more memory. You can do this using
TSQL:
If you get error 701 or 802 later during the index build, that means that the initial estimate of memory
usage was too low, and additional memory was consumed during index build execution and memory ran
out. The only viable way to work around these errors in this case is to explicitly reduce DOP when you
create the index, reduce query concurrency, or add more memory.
For all these error conditions (701, 802, 8657, and 8658), adding more memory to your system may
help.
See SQL Server Books Online for ALTER WORKLOAD GROUP for additional information.
Another way to deal with out-of-memory conditions during columnstore index build is to vertically
partition a wide table into two or more tables so that each table has fewer columns. If a query touches
both tables, the table will have to be joined, which will affect query performance. If you use this option,
you will want to allocate columns to the different tables carefully so that queries will usually touch only
one of the tables. This option would also affect any existing queries and loading scripts. Another option
is to omit some columns from the columnstore index. Good candidates are columns that are
infrequently touched by queries that require scanning large amounts of data.
In some cases, you may not be able to create a columnstore index due to insufficient memory soon after
the server starts up, but later on it may work. This is because SQL Server, by default, gradually requests
memory from the operating system as it needs it. So it may not have enough memory available to satisfy
a large memory grant request soon after startup. If this happens, you can make the system grab more
memory by running a query like "select count(*) from t" where t is a large table. Or, you can set both
the min server memory and max server memory to the same value using sp_configure, which will
force SQL Server to immediately grab the maximum amount of memory it will use from the operating
system when it starts up.
Yes. The base table can have PAGE compression, ROW compression, or no compression. The
columnstore index will have its own compression, which cannot be specified by the user.
I tried to create a columnstore index with SQL Server Management Studio using the Indexes->New
Index menu and it timed out after 20 minutes. How can I work around this?
Run a CREATE NONCLUSTERED COLUMNSTORE INDEX statement manually in a T-SQL window instead of
using the graphical interface. This will avoid the timeout imposed by the Management Studio graphical
user interface.
3. Limitations on Creating a Columnstore Index
Can I create a filtered columnstore index?
No. A columnstore index must contain data from all the rows in the table.
No. A columnstore index cannot be created on an indexed view. You also cannot use a columnstore
index to materialize a view.
No. You can only create one columnstore index on a table. The columnstore index can contain data from
all, or some, of the columns in a table. Since the columns can be accessed independently from one
another, you will usually want all the columns in the table to be part of the columnstore index.
What are the advantages and disadvantages of row stores and column stores?
When data is stored in column-wise fashion, the data can often be compressed more effectively than
when stored in row-wise fashion. Typically there is more redundancy within a column than within a row,
which usually means the data can be compressed to a greater degree. When data is more compressed,
less IO is required to fetch the data into memory. In addition, a larger fraction of the data can reside in a
given size of memory. Reducing IO can significantly speed up query response time. Retaining more of
your working set of data in memory will speed up response time for subsequent queries that access the
same data.
When data is stored column-wise, it is possible to access the column individually. If a query only
references a few of the columns in the table, it is only necessary for a subset of the columns to be
fetched from disk into memory. For example, if a query references five columns from a table with 50
columns (i.e. 10% of the columns), IO is reduced by 90% (in addition to any benefits from compression).
On the other hand, storing columns in independent structures means that the data must be recombined
to return the data as a row. When a query touches only one (or a few) rows, having all the data for one
row stored together can be an advantage if the row can be quickly located with a B-tree index. Row
stores may offer better query performance for very selective queries, such as queries that lookup a
single row or a small range of rows. Updating data is also simpler in a row store.
What is the difference between a pure column store and a hybrid column store?
SQL Server columnstore indexes are pure column stores. That means that the data is stored and
compressed in column-wise fashion and individual columns can be accessed separately from other
columns. A hybrid columnstore stores a set of rows together, but within that set of rows, data is
organized and compressed in column-wise fashion. A hybrid column store can achieve good
compression from a column-wise organization within the set of rows, but when data is fetched from
disk, the pages being fetched contain data from all the columns in each row. Even if a query references
only 10% of the columns in a table, all the columns must be fetched from disk, and unused columns also
take up space in main memory. SQL Server columnstore indexes require less I/O and give better main-
memory buffer pool hit rates than a hybrid columnstore.
Is a columnstore index better than a covering index that has exactly the columns I need for a query
The answer depends on the data and the query. Most likely the columnstore index will be compressed
more than a covering row store index. If the query is not too selective, so that the query optimizer will
choose an index scan and not an index seek, scanning the columnstore index will be faster than scanning
the row store covering index. In addition, depending on the nature of the query, you can get batch
mode processing when the query uses a columnstore index. Batch mode processing can substantially
speed up operations on the data in addition to the speed up from a reduction in IO. If there is no
columnstore index used in the query plan, you will not get batch mode processing. On the other hand, if
the query is very selective, doing a single lookup, or a few lookups, in a row store covering index might
be faster than scanning the columnstore index.
Another advantage of the columnstore index is that you can spend less time designing indexes. A row
store index works well when it covers all the columns needed by a query. Changing a query by adding
one more column to the select list can render the covering index ineffective. Building one columnstore
index on all the columns in the table can be much simpler than designing multiple covering indexes.
Is the columnstore index the same as a set of covering indexes, one for each column?
No. Although the data for individual columns can be accessed independently, the columnstore index is a
single object; the data from all the columns is organized and compressed as an entity. While the
amount of compression achieved is dependent on the characteristics of the data, a columnstore index
will most likely be much more compressed than a set of covering indexes, resulting in less IO to read the
data into memory and the opportunity for more of the data to reside in memory across multiple
queries. In addition, queries using columnstore indexes can benefit from batch mode processing,
whereas a query using covering indexes for each column would not use batch mode processing.
Is columnstore index data still compressed after it is read into memory?
Yes. Column segments are compressed on disk and remain compressed when cached in memory.
No. Columnstore indexes use a proprietary data representation based on Vertipaq. It’s not the same as a
bitmap index and doesn’t use one. But it has some similar benefits to bitmap indexes, such as reducing
the time it takes to filter on a column with a small number of distinct values.
I want to show other people how cool SQL Server columnstore indexes are. What can I show them?
OR
Where can I find more information (including documents and videos) about SQL Server columnstore
indexes?
White paper:
http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C-
3EE521C25CE9/Columnstore%20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf
Product documentation:
http://msdn.microsoft.com/en-us/library/gg492088(SQL.110).aspx
TechEd 2012 talk video, SQL Server Columnstore Performance Tuning, 1 hour, 15 minutes:
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI409
IEEE Data Engineering Bulletin Paper on SQL Server columnstore indexes, March 2012:
http://sites.computer.org/debull/A12mar/apollo.pdf
VertiPaq vs ColumnStore: Performance Analysis of the xVelocity Engine, v1.0, rev 2, Aug 3, 2012.
http://www.sqlbi.com/wp-content/uploads/Vertipaq-vs-ColumnStore1.pdf
Microsoft SQL Server 2012 Columnstore for Real Time Reporting in Manufacturing Automation (COPA-
DATA zenon Analyzer), 2012.
http://www.kreatron.ro/news/newsdetail_65.html
Slide deck on CDR (Telecom) application design loading 100M rows per day with 3 year retention
http://sqlug.be/media/p/1238.aspx
Each physical partition of a columnstore index is broken into one-million-row chunks called segments
(a.k.a. row groups). The index build process creates as many full segments as possible. Because multiple
threads work to build an index in parallel, there may be a few small segments (typically equal to the
number of threads) at the end of each partition with the remainder of the data after creating full
segments. That's because each thread might hit the end of its input at different times. Non-partitioned
tables have one physical partition.
How do I know whether the columnstore index is being used for my query?
You can tell whether a columnstore index is being used by looking at showplan. In graphical showplan,
there is a new icon for columnstore index scans. In addition, columnstore index scans have a new
property, storage, with the value ColumnStore.
Existing hints work with columnstore indexes. If you have a nonclustered columnstore index named
mycsindex on a table named mytable you could use a table hint such as
… FROM mytable WITH (INDEX (mycsindex)) …
You can either use a table hint to force the use of a different index, or you can use a new query hint:
IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX. This new hint will prevent the use of any
nonclustered columnstore indexes in the query.
Below is an example of using the hint to prevent use of any nonclustered columnstore index in a query:
SQL Server columnstores provide the performance benefits of a pure in-memory system with the
convenience and economics of a system that stores data on disk and caches recently used data in
memory. Columnstores hold data in memory in a different format than is kept on disk. This in-memory
representation is highly optimized to support fast query execution on modern processors. Not all data
has to fit in memory with a SQL Server columnstore index. But if all columnstore data does fit in
memory, SQL Server provides pure-in-memory levels of performance.
Why require all data to fit in memory (capping your database size or demanding a large budget to
purchase memory, and demanding slow system startup times) if you can get the best of both worlds,
that is, state-of-the-art query performance on economical hardware?
Does all the data have to fit in memory when I use a columnstore index?
No, a columnstore index is persisted on disk just like any other index. It is read into memory when
needed just like other types of indexes. The columnstore index is divided into units called segments,
which are the unit of transfer. A segment is stored as a LOB, and can consist of multiple pages. We
elected to bring columnstore index data into memory on demand rather than require that all data fits in
memory so customers can access databases much bigger than will fit in main memory. If all your data
fits in memory, you'll get reduced I/O and the fastest possible query performance. But it's not necessary
for all data to fit in memory, and that's a plus.
A columnstore index is read into memory when needed just like other types of indexes.
You cannot force the columnstore index to be loaded, or kept, in memory but you can warm the cache
by running a query that will cause the columnstore data to be read into memory.
When should I build a columnstore index?
Columnstore indexes are designed to accelerate data warehouse queries, not OLTP workloads. Use
columnstore indexes when your query workload entails scanning and aggregating large amounts of data
or joining multiple tables, especially in a star join pattern. The restrictions on how you update the data
will also affect your choice. Columnstore indexes will be easiest to manage if you have a read-mostly
workload and if partition switching to update the data will fit into your workflow. Partition switching for
handling updates is easier if most updates consist of appending new data to the existing table and can
be placed in a staging table that can be switched into the table during periodic load cycles.
Typically you will want to build a columnstore index on large fact tables and maybe on large dimension
tables as well. You can build a columnstore index on very small tables, but the performance advantage
is less noticeable when the table is small. If you frequently update your dimension tables, and they are
not too large, you may find the maintenance effort outweighs the benefit of a columnstore index.
If you frequently update the data in a table, or if you need to update a large table but partition switching
does not fit your workflow, you might not want to create a columnstore index. If most of your queries
are small lookup queries, seeking into a B-tree index may be faster and you may not find a columnstore
index to be beneficial. If you test a columnstore index and it does not benefit your workload, you can
drop or disable the index.
Can you do trickle load and real-time query with a columnstore index?
Yes. Even though tables with a columnstore index are read-only, you can maintain two tables, the one
with the columnstore, and a second table with the same schema structured as a B-tree or heap. The
second table, called adifferential file, holds newly inserted rows. You query the combined table by
modifying your queries to aggregate results from the two tables separately, and combine them. This is
called local-global aggregation. Periodically, (say during a nightly batch window) you move data from the
row-structured table to the columnstore table. See here for details and an example on how to do trickle
load.
6. Managing Columnstore Indexes
Yes.
The columnstore index is compressed when it is created. You cannot apply PAGE or ROW compression
to a columnstore index. When a columnstore index is created, it uses the VertiPaqTM compression
algorithms, which compress the data more than either PAGE or ROW compression. There is no user
control over compression of the columnstore index.
What is the difference in storage space used between the base table and the columnstore index?
Based on our experiments with a variety of different data sets, columnstore indexes are about 4X to 15X
smaller than an uncompressed heap or clustered B-tree index, depending on the data.
Yes, you can create a columnstore index on a partitioned table. The columnstore index must be
partition-aligned with the base table. If you do not specify a partition scheme when you create the
columnstore index, the index will be automatically created using the same partition scheme as the base
table. You can switch a partition in and out of a partitioned table with the same requirements regarding
matching indexes as exist for other types of clustered and nonclustered indexes.
Yes, you can partition a columnstore index, but the base table must also be partitioned and the
columnstore index must be partition-aligned with the base table.
How do I add to, or modify, the data in a table with a columnstore index?
Once you create a columnstore index on a table, you cannot directly modify the data in that table. A
query with INSERT, UPDATE, DELETE, or MERGE will fail and return an error message. To add or modify
the data in the table, you can do one of the following:
Disable or drop the columnstore index. You can then update the data in the table. If you disable
the columnstore index, you can rebuild the columnstore index when you finish updating the
data. For example:
No. You can only disable or rebuild a columnstore index on the entire table. If you want to rebuild only
one partition, you should switch the partition into an empty staging table, disable/rebuild the index on
the staging table, and switch the staging table back into the main table. There is no need to rebuild the
index except when you want to modify the data in the table.
There are two ways to determine whether a columnstore exists on a table. In Management Studio, you
can look at the Object Explorer. Each table has an entry for Indexes. Columnstore indexes are included in
the list of indexes and have their own icon and description. You can also look at various catalog tables. In
sys.indexes, a columnstore index has type = 6 and type_desc = “NONCLUSTERED COLUMNSTORE.” A
new catalog table, sys.column_store_index_stats, has one row for each columnstore index.
How can I find out more about my columnstore indexes? Is there metadata?
There are two new catalog tables with data about columnstore indexes:
sys.column_store_segments
sys.column_store_dictionaries
VIEW DEFINITIONS permission on a table is required to see information in the catalog tables about a
columnstore index on that table. In addition, a user must have SELECT permission on the table to see
data in the following columns:
sys.column_store_segments:
has_nulls, base_id, magnitude, min_data_id, max_data_id, null_value, data_ptr
sys.column_store_dictionaries:
last_id, entry_count, data_ptr
A user who does not have SELECT permission on a table will see NULL as the value in the columns listed
above.
Yes, each partition is compressed separately. Each partition has its own dictionaries. All segments within
a partition share dictionaries. Dictionaries for different partitions are independent. This allows partition
switching to be a metadata-only operation.
You can use the new catalog tables or sys.dm_db_partition_stats to determine how big the columnstore
indexes are on disk. A relatively simple query to get the size of one columnstore index is:
Here are some other queries that total up column store component sizes.
-- total size
with total_segment_size as (
SELECT
SUM (css.on_disk_size)/1024/1024 AS segment_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_segments AS css
ON p.hobt_id = css.hobt_id
)
,
total_dictionary_size as (
SELECT SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_dictionaries AS csd
ON p.hobt_id = csd.hobt_id
)
select
segment_size_mb,
dictionary_size_mb,
segment_size_mb + isnull(dictionary_size_mb, 0) as total_size_mb
from total_segment_size
left outer join total_dictionary_size
on 1 = 1
go
-- It may be that not all the columns in a table will be or can be included
-- in a nonclustered columnstore index,
-- so we need to join to the sys.index_columns to get the correct column id.
Select Object_Name(s.table_id) as table_name, C.column_id,
col_name(S.table_id, C.column_id) as column_name, s.segment_size_mb,
d.dictionary_size_mb, s.segment_size_mb + isnull(d.dictionary_size_mb, 0) total_size_mb
from segment_size_by_column s
join
sys.indexes I -- Join to Indexes system table
ON I.object_id = s.table_id
join
sys.index_columns c --Join to Index columns
ON c.object_id = s.table_id
And I.index_id = C.index_id
and c.index_column_Id = s.column_id --Need to join to the index_column_id with the
column_id
left outer join
dictionary_size_by_column d
on s.table_id = d.table_id
and s.column_id = d.column_id
Where I.type_desc = 'NONCLUSTERED COLUMNSTORE'
order by total_size_mb desc
go
Why is a columnstore index built from a heap larger than a columnstore index built on the same
data from a clustered B-tree?
The columnstore index has to store an extra bookmark column (containing the record id, or rid, for the
row) when the base table is a heap. The bookmark is 8 bytes long and unique. Hence, if you have 1
million rows, that's an extra 8MB to store, since the columnstore index cannot compress distinct values.
So, please keep that in mind when you build a columnstore index directly on top of a heap. If
compression is a high priority, consider building a clustered index before you build a nonclustered
columnstore index.
The query optimizer uses table statistics to help choose query plans. Tables with a columnstore index
can have statistics. The statistics are gathered from the underlying B-tree or heap on the table with the
columnstore, not from the columnstore itself. No statistics are created as a byproduct of creating a
columnstore index. This is different from creation of a B-tree, where statistics are created for the B-tree
key. See here for additional information about statistics and columnstore indexes.
For columnstore indexes in large data warehouses, we recommend you use the same best practices for
file group management as for clustered indexes for large fact tables described in the Fast Track 3.0
guidelines here:http://msdn.microsoft.com/en-us/library/gg605238.aspx . As the Fast Track guidelines
evolve, we expect to provide explicit guidance for filegroup placement of columnstore indexes.
Yes. Although a FILESTREAM column can't be included in a columnstore index, other columns of the
table can.
I am running out of space in my PRIMARY file group with columnstores. How can I avoid this?
Metadata for each row group is kept in the primary file group in a set of internal system tables, even if
your tables are kept in other file groups. Every time a new row group is created, a little more space is
used in the primary file group. A row group typically contains about one million rows, although smaller
row groups can be created under certain conditions.
Each row in the column segment system table is 96 bytes. Total space for a rowgroup = Number of
columns * 96 bytes.
Each row in the dictionary system table is 64 bytes. Total space per rowgroup = Number of dictionaries
(primary + secondary) in the HoBt * 64.
Make sure to provide enough space in your primary file group to accommodate this metadata. For
example, a 300 column table could use close to 50,000 bytes per row group. If this table has ten billion
rows it will have about ten thousand row groups. This could take up to 500MB for the row group
metadata in the primary file group. Provision plenty of space in advance for the primary file group, or
leave autogrow on and provide enough raw disk space to accommodate the growth.
Batch mode processing uses a new iterator model for processing data a-batch-at-a-time instead of a-
row-at-a-time. A batch typically represents about 1000 rows of data. Each column within a batch is
stored as a vector in a separate area of memory, so batch mode processing is vector-based. Batch mode
processing also uses algorithms that are optimized for the multicore CPUs and increased memory
throughput that are found on modern hardware. Batch mode processing spreads metadata access costs
and other types of overhead over all the rows in a batch, rather than paying the cost for each
row. Batch mode processing operates on compressed data when possible and eliminates some of the
exchange operators used by row mode processing. The result is better parallelism and faster
performance.
How do I know whether batch mode processing is being used for my query?
Batch mode processing is only available for certain operators. Most queries that use batch mode
processing will have part of the query plan executed in row mode and part in batch mode. You can tell
whether batch mode processing is being used for an operator by looking at showplan. If you look at the
properties for a scan or other operator in the Actual Execution Plan, you will see two new properties:
EstimatedExecutionMode and ActualExecutionMode. Only EstimatedExecutionMode is displayed in the
Estimated Execution Plan. The values for these two properties can be eitherrow or batch. There is also a
new operator for hash joins when they are being executed in batch mode.
TheBatchHashTableBuild operator appears in graphical showplan and has a new icon.
The query optimizer chooses whether to use batch mode processing when it formulates the query plan.
Most of the time, EstimatedExecutionMode and ActualExecutionMode will have the same value,
either batch or row. At run time, two things can cause a query plan to be executed in row mode instead
of batch mode: not enough memory or not enough threads. The most common reason for the
ActualExecutionMode to be row when the EstimatedExecutionMode was batch is that there was a large
hash join and all the hash tables could not fit in memory. Batch mode processing uses special in-memory
hash tables. If the hash tables do not fit in memory, execution of the query reverts to using row mode
and traditional hash tables that can spill to disk. The other reason for changing to row mode is when not
enough threads are available for parallel execution. Serial execution always occurs in row mode. You can
tell that a fall back to serial execution occurred if the estimated query plan shows parallel execution but
the actual query plan is executed serially.
If the query executes in parallel but falls back to row mode processing, you can infer that memory was
the problem. There is also an xevent (batch_hash_table_build_bailout) that is fired when there is not
enough memory during hash join and the query falls back to row mode processing. If this happens,
incorrect cardinality estimation may have contributed to the problem. Check the cardinality estimation
and consider updating statistics on the table.
Can I get batch mode processing even if I don’t have a columnstore index?
No. Batch mode processing only occurs when a columnstore index is being used in the query.
What query execution plan operators are supported in batch mode in Denali?
Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
(Batch) hash table build
What about the parallelism operators in batch mode hash joins? Why are they always in row
mode?
Some of the parallelism operators in query plans for batch mode hash joins are not needed in batch
mode. Although the operator appears in the query plan, the number of rows for the operator is zero
and the query does not incur the cost of redistributing rows among different threads. The operator
remains in the query plan because, if the hash join must spill to disk (if all the hash tables do not fit into
the memory allotted for the query), the query reverts to row mode when it spills to disk. The parallelism
operators are required for executing the query in row mode. If the hash join spills to disk you will see
the warning "Operator used tempdb to spill data during execution." If you look at the properties for the
parallelism operators (Repartition Streams), you will see that the actual number of rows is greater than
zero if the hash join has spilled.
SQL Server Columnstore Performance Tuning
Introduction
SQL Server columnstore indexes are new in the SQL Server 2012 release. They are designed to improve
query performance for data warehouses and data marts. This page describes query performance tuning
for columnstores.
Columnstore indexes can speed up some queries by a factor of 10X to 100X on the same hardware
depending on the query and data. These key things make columnstore-based query processing so fast:
The columnstore index itself stores data in highly compressed format, with each column
kept in a separate group of pages. This reduces I/O a lot for most data warehouse queries
because many data warehouse fact tables contain 30 or more columns, while a typical query
might touch only 5 or 6 columns. Only the columns touched by the query must be read from
disk. Only the more frequently accessed columns have to take up space in main memory. The
clustered B-tree or heap containing the primary copy of the data is normally used only to build
the columnstore, and will typically not be accessed for the large majority of query processing.
It'll be paged out of memory and won't take main memory resources during normal periods of
query processing.
There is a highly efficient, vector-based query execution method called "batch processing"
that works with the columnstore index. A "batch" is an object that contains about 1000 rows.
Each column within the batch is represented internally as a vector. Batch processing can reduce
CPU consumption 7X to 40X compared to the older, row-based query execution
methods. Efficient vector-based algorithms allow this by dramatically reducing the CPU
overhead of basic filter, expression evaluation, projection, and join operations.
Segment elimination can skip large chunks of data to speed up scans. Each partition in a
columnstore indexes is broken into one million row chunks called segments. Each segment has
metadata that stores the minimum and maximum value of each column for the segment. The
storage engine checks filter conditions against the metadata. If it can detect that no rows will
qualify then it skips the entire segment without even reading it from disk.
The storage engine pushes filters down into the scans of data. This eliminates data early
during query execution, improving query response time.
The columnstore index and batch query execution mode are deeply integrated into SQL Server. A
particular query can be processed entirely in batch mode, entirely in the standard row mode, or with a
combination of batch and row-based processing. The key to getting the best performance is to make
sure your queries process the large majority of data in batch mode. Even if the bulk of your query
can't be executed in batch mode, you can still get significant performance benefits from columnstore
indexes through reduced I/O, and through pushing down of predicates to the storage engine.
To tell if the main part of your query is running in batch mode, look at the graphical showplan, hover the
mouse pointer over the most expensive scan operator (usually a scan of a large fact table) and check the
tooltip. It will say whether the estimated and actual execution mode was Row or Batch. See here for an
example.
Obeying the following do's and don'ts will help you get the most out of columnstores for your decision
support workload.
DOs
Put columnstore indexes on large tables only. Typically, you will put them on your fact tables
in your data warehouse, but not the dimension tables. If you have a large dimension table,
containing more than a few million rows, then you may want to put a columnstore index on it as
well.
Include every column of the table in the columnstore index. If you don't, then a query that
references a column not included in the index will not benefit from the columnstores index
much or at all.
Structure your queries as star joins with grouping and aggregation as much as possible.
Avoid joining pairs of large tables. Join a single large fact table to one or more smaller
dimensions using standard inner joins. Use a dimensional modeling approach for your data as
much as possible to allow you to structure your queries this way.
Use best practices for statistics management and query design. This is independent of
columnstore technology. Use good statistics and avoid query design pitfalls to get the best
performance. See the white paper on SQL Server statistics for guidance. In particular, see the
section "Best Practices for Managing Statistics."
DON'Ts
(Note: we are already working to improve the implementation to eliminate limitations associated with
these "don'ts" and we anticipate fixing them sometime after the SQL Server 2012 release. We're not
ready to announce a timetable yet.) Later, we'll describe how to work around the limitations.
Avoid joins and string filters directly on columns of columnstore-indexed tables. String
filters don't get pushed down into scans on columnstore indexes, and join processing on strings
is less efficient than on integers. Filters on number and date types are pushed down. Consider
using integer codes (or surrogate keys) instead of strings in columnstore indexed fact tables. You
can move the string values to a dimension table. Joins on the integer columns normally will be
processed very efficiently.
Avoid use of OUTER JOIN on columnstore-indexed tables. Outer joins don't benefit from
batch processing. Instead, SQL Server 2012 reverts to row-at-a-time processing.
Avoid use of NOT IN on columnstore-indexed tables. NOT IN (<subquery>) (which internally
uses an operator called "anti-semi-join") can prevent batch processing and cause the system to
revert to row mode. NOT IN (<list of constants>) typically works fine though.
Avoid use of UNION ALL to directly combine columnstore-indexed tables with other
tables. Batch processing doesn't get pushed down over UNION ALL. So, for example, creating a
view vFact that does a UNION ALL of two tables, one with a columnstore indexes and one
without, and then querying vFact in a star join query, will not use batch processing.
Follow the links to the topics listed below about how to maximize performance with columnstores
indexes, and work around their functional and performance limitations in SQL Server 2012.
Introduction
SQL Server 2012 introduces these two new functions which simplify CASE expression:
We also have been working with COALESCE, an old simplified CASE expression statement as a NULL-
related statement since early versions. Although ISNULL is a function, which logically simplifies a CASE
expression, but it never translates to a CASE expression behind the scene (by execution plan). By the way,
we will also cover ISNULL in this article, as it is an alternative to COALESCE. The goal of this article is
providing in depth tutorial about these statements:
1. ISNULL
2. COALESCE
3. IIF
4. CHOOSE
I prefer using the term “statement” because although they do similar job, but they are not in same
category by their purpose. For example, ISNULL is a function while COALESCE is an expression.
As we will see later, the main purpose of introducing these statements is improving code readability and
achieving cleaner code. Using these statements may result to poor performance in some situations.
Therefore, we also will discuss alternative solutions.
This article targets all levels of readers: from newbies to advanced. So, if you are familiar with these
statements, you may prefer skipping Definition section.
Definition
ISNULL
ISNULL(expr_1, expr_2)
If expr_1 is null, then ISNULL function returns expr_2, otherwise returns expr_1. Following example shows its
functionality.
Output:
When the data types of two arguments are different, if they are implicitly convertible, SQL Server converts
one to the other, otherwise returns an error. Executing follow code results an error as illustrated in output
figure.
Output:
Changing value of variable @Val_2 to ‘500’, we do not encounter any error. Because this value is
convertible to numeric data type INT. Following code shows this:
Implicit conversion may lead to data truncation. This will happen if the length of expr_1 data type is
shorter thanlength of expr_2 data type. So it is better to convert explicitly if needed. In the next example
first output column suffers from value truncation while second will not.
There are few rules to determine output column's data type generated via ISNULL. The next code
illustrates these rules:
SELECT COLUMN_NAME ,
DATA_TYPE ,
CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
Output:
Follow code illustrates the rules to determine output column data type generated via ISNULL:
SELECT COLUMN_NAME ,
IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
Output
COALESCE
COALESCE returns the first NOT NULL expression in the expression list. It needs at least two expressions.
Dissimilar from ISNULL function, COALESCE is not a function, rather it’s an expression. COALESCE always
translates to CASE expression. For example,
is equivalent to:
CASE
END
Therefore the database engine handles it like handling a CASE expression. So this is inside our simplified
CASE expression list.
Following code is one of many samples that could illustrate different execution plans for COALESCE and
ISNULL:
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE ISNULL(ProductID, SpecialOfferID) = 3 ;
SELECT *
FROM Sales.SalesOrderDetail
WHERE coalesce(ProductID, SpecialOfferID) = 3 ;
Pic
By using COALESCE, we do not have the limitations that discussed about ISNULL function, neither about
output column data type nor output column NULL-ability. Even there is no more suffering from value
truncation. The next example is the new revision of the ISNULL section examples, but replacing with
COALESCE:
-- value truncation
DECLARE @Val_1 NVARCHAR(2) ,
@Val_2 NVARCHAR(10) ;
----------------------------------------------------------
-- output data type
IF OBJECT_ID('dbo.TestISNULL', 'U') IS NOT NULL
DROP TABLE dbo.TestISNULL ;
DECLARE @Val_1 NVARCHAR(200) ,
@Val_2 DATETIME ;
SELECT COLUMN_NAME ,
DATA_TYPE ,
CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
GO
----------------------------------------------------------
-- NULL-ability
IF OBJECT_ID('dbo.TestISNULL', 'U') IS NOT NULL
DROP TABLE dbo.TestISNULL ;
SELECT COLUMN_NAME ,
IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
GO
Output
IIF
IIF( condition , x, y)
IIF is a logical function which was introduced in SQL Server 2012. It is like conditional operator in C-Sharp
language. When condition is true, x evaluated, else y evaluated. Following example illustrates this function
usage.
DECLARE @x NVARCHAR(10) ,
@y NVARCHAR(10) ;
SET @x = N'True' ;
SET @y = N'False' ;
Like COALESCE expression, IIF function always translates to CASE expression. For instance,
CASE
WHEN (condition is true) THEN (true_value)
ELSE (false_value)
END
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE IIF ( OrderQty >= SpecialOfferID , OrderQty, SpecialOfferID ) = 1
Pic 010
CHOOSE
CHOOSE is a selection function which was introduced in SQL Server 2012. It’s like switch operator in C-
Sharp language. If index (must be convertible to data type INT) is NULL or its value is not found, the output
will be NULL. This function needs at least two arguments, one for index and other for value. Following
code illustrates this function usage.
SET @index = 2 ;
is equivalent to:
CASE
END
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE CHOOSE(OrderQty, 'Black', 'White', 'Green') = 'White'
Pic 012
Performance
Although the main purpose of simplified CASE expression statements is increasing readability and having
cleaner codes, but one important question is how these statements impact on the database performance.
Is there any performance difference between CASE expression and these statements? By the way, to
achieve best performance it’s usually better to find alternative solutions and avoid using CASE and these
statements.
Dynamic filtering
This is common to write reports which accept input parameters. To achieve better performance it’s a good
practice to write their code within stored procedures, because procedures store the way of their executing
as an execution plan and reuse it again. By the way, there are some popular solutions to write this type of
procedures.
IS NULL and OR
This is the most common solution. Let me start with an example and rewrite it with comparable solutions:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE (a.ModifiedDate = @ModifiedDate OR @ModifiedDate IS NULL)
AND (b.ShipDate = @ShipDate OR @ShipDate IS NULL)
AND (c.StoreID = @StoreID OR @StoreID IS NULL)
GO
-----------------------------------------------
-- now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
The main problem here, as illustrated in above figure, is using same execution plan for all the three
situations. It’s obvious that the third one suffers from an inefficient execution plan.
CASE
We can change the combination of IS NULL and OR and translate it using CASE. Now we rewrite above
code like this one:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE a.ModifiedDate
= CASE WHEN @ModifiedDate IS NOT NULL THEN @ModifiedDate ELSE a.ModifiedDate END
AND b.ShipDate
= CASE WHEN @ShipDate IS NOT NULL THEN @ShipDate ELSE b.ShipDate END
AND c.StoreID
= CASE WHEN @StoreID IS NOT NULL THEN @StoreID ELSE c.StoreID END
GO
-----------------------------------------------
-- now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
Using CASE shows improvements to IS NULL and OR, but with more CPU cost for the first one. Also the
Reads and Actual Rows decreased in first two executions. So it’s better but still we continue our
experiment.
COALESCE
We also can change CASE and translate it to COALESCE. Now we rewrite above code like this:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE a.ModifiedDate = COALESCE(@ModifiedDate, a.ModifiedDate)
AND b.ShipDate = COALESCE(@ShipDate, b.ShipDate)
AND c.StoreID = COALESCE(@StoreID, c.StoreID)
GO
-----------------------------------------------
-- now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
It’s obvious that because COALESCE translates to CASE internally, so there is no difference between them.
ISNULL
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE a.ModifiedDate = ISNULL(@ModifiedDate, a.ModifiedDate)
AND b.ShipDate = ISNULL(@ShipDate, b.ShipDate)
AND c.StoreID = ISNULL(@StoreID, c.StoreID)
GO
-----------------------------------------------
-- now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
Dynamic SQL
Using above four solutions we could not achieve good performance, because we need different efficient
execution plan for each combination of input parameters. So it’s time to use an alternative solution to
overcome this problem.
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
DECLARE @sql NVARCHAR(MAX), @parameters NVARCHAR(4000) ;
SET @parameters =
'@xModifiedDate AS DATETIME ,
@xShipDate AS DATETIME ,
@xStoreID AS INT' ;
GO
-----------------------------------------------
-- now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
There is no doubt that this solution is the best one! Here is the comparison chart. (lower is better)
You can find more information about last solution in Erland Sommarskog website .
This is another common problem that fits our discussion. In this example we just cover COALESCE and
ISNULL solutions and at last we will see an alternative solution which performs better than using the CASE
solutions.
COALESCE
Next code concatenates the values of column “ProductID” and delimits each with comma separator.
USE AdventureWorks2012
GO
DECLARE @sql NVARCHAR(MAX);
Execution statistics:
USE AdventureWorks2012
GO
DECLARE @sql NVARCHAR(MAX);
Execution statistics:
XML
USE AdventureWorks2012
GO
DECLARE @sql NVARCHAR(MAX);
The duration decreased to 21 milliseconds. Here is the comparison chart. (lower is better)
This is so common to use CHOOSE function to write cleaner codes. But is it the best solution to achieve
optimal performance? In this section we will discuss about this question.
CHOOSE
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE CHOOSE(OrderQty, 'J', 'I', 'H', 'G', 'F', 'E', 'D', 'C', 'B', 'A')
IN ( 'J', 'Q', 'H', 'G', 'X', 'E', 'D', 'Y', 'B', 'A', NULL )
GO
Execution statistics:
Now we rewrite above code and use a Table Valued Function to produce CHOOSE list:
USE AdventureWorks2012 ;
GO
GO
SELECT *
FROM Sales.SalesOrderDetail a
JOIN dbo.ufnLookup() b ON a.OrderQty = b.Indexer
WHERE b.val IN ( 'J', 'Q', 'H', 'G', 'X', 'E', 'D', 'Y', 'B', 'A', NULL ) ;
Execution statistics:
USE AdventureWorks2012 ;
GO
INSERT dbo.LookupTable
( id, val )
SELECT 1 AS Indexer, 'J' AS val
UNION ALL
SELECT 2, 'I'
UNION ALL
SELECT 3, 'H'
UNION ALL
SELECT 4, 'G'
UNION ALL
SELECT 5, 'F'
UNION ALL
SELECT 6, 'E'
UNION ALL
SELECT 7, 'D'
UNION ALL
SELECT 8, 'C'
UNION ALL
SELECT 9, 'B'
UNION ALL
SELECT 10, 'A' ;
GO
SELECT *
FROM Sales.SalesOrderDetail a
JOIN dbo.LookupTable b ON a.OrderQty = b.Id
WHERE b.val IN ( 'J', 'Q', 'H', 'G', 'X', 'E', 'D', 'Y', 'B', 'A', NULL )
The duration decreased to 173 milliseconds. Next figure shows the comparison chart between these
solutions. (lower is better)
This solution is the best one. By increasing the number of values in parameter list of CHOOSE function,
the performance decreases. So by using permanent look-up table that benefits from physical index we
can achieve the best performance.
More Readability
The most important goal to use these simplified CASE statements is achieve cleaner code. Many times
we encounter this issue that code is so large that the SELECT list becomes more than hundred lines of
code. Therefore there is a significant reason to use these statements. I was faced a simple problem just
few years ago. In first sight it seems that solution should be very simple. But after writing the code using
CASE, I found that I am in trouble. The problem was so simple. Assume that a department store has two
discount plan, one based on purchases amount, and other based on the distance from customer’s home
to store. But the rule was that just one plan that is greater is applicable. Next code shows two solutions,
first by using CASE and second uses IIF.
SELECT *
FROM #temp
-- solution using CASE
SELECT
CASE
WHEN
CASE WHEN Bill < 10.00 THEN 10 ELSE 20 END > CASE WHEN Distance <
10 THEN 7 ELSE 13 END
THEN CASE WHEN Bill < 10.00 THEN 10 ELSE 20 END
ELSE CASE WHEN Distance < 10 THEN 7 ELSE 13 END
END AS Discount
FROM #temp
Conclusion
Using simplified CASE expression statements results to have cleaner code and speed up development
time, but they show poor performance in some situations. So if we are in performance tuning phase of
software development, it’s better to think about alternative solutions.
Structured Error Handling Mechanism in SQL Server 2012
The goal of this article is to provide a simple and easy to use error handling mechanism with minimum
complexity.
Problem definition
There are many questions in MSDN forum and other Internet communities about Error Handling in
SQL Server. These are such questions:
Introduction
There are many articles written by the best experts in this context and there are complete references
about Error Handling in SQL Server. The goal of this article is to provide a simple and easy to use error
handling mechanism with minimum complexity. Therefore I will try to address this topic from a problem
solving approach and particularly in SQL Server 2012 version. So the road map of this article is to cover
the above questions as well as providing a step by step tutorial to design a structured mechanism for error
handling in SQL Server 2012 procedures.
Solution
Yes, there is. The TRY/CATCH construct is the structured mechanism for error handling in SQL Server 2005
and later. This construct has two parts; we can try executing some statements in TRY block and handling
errors in the CATCH block if they occur. Therefore, the simplest error handling structure can be like this:
TRY
o Try executing statements
CATCH
o Handle the errors if they occur
Here is a sample code to provide the above structure in the simplest form:
SET NOCOUNT ON;
--result
When executing statements in the TRY block, if an error occurs the flow of execution will transfer to
the CATCH block. So the answer is NO!
We can see this behavior with an example. As you can see after executing the following code, the
statement no. 3 does not try executing, because the flow of execution will transfer to the CATCH block as
soon as statement no. 2 raises an error.
SET NOCOUNT ON;
--result
No. The role of the TRY/CATCH construct is just providing a mechanism to try executing SQL statements.
Therefore, we need to use another construct or statements to handle the errors in the CATCH block that
I explain later. For instance, the following code will try to execute a divide by zero statement. It does not
automatically handle any errors. In fact, in this sample code, when an error occurs the flow control
immediately transfers to the CATCH block, but in the CATCH block we do not have any statement to tell
us that there was an error!
SET NOCOUNT ON;
SELECT 1 / 0; -- Statement
In the CATCH block we can handle the error and send the error message to the application. So we need
an element to show what error occurs. This element is RAISERROR. So the error handling structure could
be like this:
TRY
o Try executing statements
CATCH
o Handle the error if occurs
RAISERROR
SELECT 1 / 0; -- Statement
The RAISERROR itself needs other elements to identify the error number, error message, etc. Now we
can complete the error handling structure:
TRY
SELECT 1 / 0; -- Statement
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
--result
From a modular programming approach it’s recommended to create a stored procedure that do the
RAISERROR job. But I believe that using a modular procedure (I call it spErrorHandler) to re-raise errors
is not a good idea. Here are my reasons:
1. When we call RAISERROR in procedure “spErrorHandler”, we have to add the name of the procedure
that the error occurs within to the Error Message. This will confuse the application end-users
(Customer). Customer do not want to know which part of his car is damaged. He prefers that his car just
send him a simple message which tells him there is an error in its functions. In the software world it’s
more important to send a simple (English) message to the customer, because if we send a complex error
message, he will be afraid of what will happen to his critical data!
2. If we accept the first reason and decide to resolve this issue, we need to send a simple message to the
client application. So we will lose the procedure name that the error occurs within and other useful
information for debug unless we insert this useful information in an Error-Log table.
You can test this scenario with the following code:
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
go
-----------------------------------------
SELECT 1 / 0; -- Statement
EXEC spErrorHandler;
go
exec spTest;
--result
As is illustrated in this figure, when using spErrorHandler, the values of ERROR_PROCEDURE() and
ERROR_NUMBER() are changed in the output. This behavior is because of the RAISERROR functionality.
This function always re-raises the new exception, so spErrorHandler always shows that the value of
ERROR_PROCEDURE() simply is “spErrorHandler”. As I said before there are two workarounds to fix this
issue. First is concatenating this useful data with the error message and raise it, which I spoke about in
reason one. Second is inserting this useful data in another table just before we re-raise the error in
spErrorHandler.
SELECT 1 / 0; -- Statement
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
--result
As you see in this figure, the procedure name and error number are correct. By the way, I prefer that if
one customer reports an error, I go for SQL Server Profiler, simulate the environment completely, and
test those SQL statements in SSMS to recreate the error and debug it based on the correct error number
and procedure name.
In the THROW section, I will explain that the main advantage of THROW over RAISERROR is that it shows
the correct line number of the code that raises the error, which is so helpful for a developer in
debugging his code.
3. Furthermore, with the THROW statement introduced in SQL SERVER 2012, there is no need to write
extra code in the CATCH block. Therefore there is no need to write a separate procedure except for
tracking the errors in another error log table. In fact this procedure is not an error handler, it's an error
tracker. I explain the THROW statement in the next section.
The main objective of error handling is that the customer knows that an error occurred and reports it to
the software developer. Then the developer can quickly realize the reason for the error and improve his
code. In fact error handling is a mechanism that eliminates the blindness of both customer and
developer.
To improve this mechanism Microsoft SQL Server 2012 introduced the THROW statement. Now I will
address the benefits of THROW over RAISERROR.
As I said earlier this is the main advantage of using THROW. The following code will enlighten this great
feature:
SELECT 1/0
END TRY
BEGIN CATCH
THROW
END CATCH
go
exec sptest
--result
As you can see in this figure, the line number of the error that RAISERROR reports to us always is the line
number of itself in the code. But the error line number reported by THROW is line 6 in this example,
which is the line where the error occurred.
Easy to use
Another benefit of using the THROW statement is that there is no need for extra code in RAISERROR.
Complete termination
The severity level raised by THROW is always 16. But the more important feature is that when the
THROW statement in a CATCH block is executed, then other code after this statement will never run.
The following sample script shows how this feature protects the code compared to RAISERROR:
SELECT 1/0
END TRY
BEGIN CATCH
declare @msg nvarchar(2000) = error_message();
raiserror( @msg , 16, 1);
SELECT *
FROM #Saeid;
This feature makes it possible to re-throw custom message numbers without the need to
use sp_addmessage to add the number.
SELECT 1/0
END TRY
BEGIN CATCH
END CATCH
go
exec sptest
Tip
The statement before the THROW statement must be followed by the semicolon (;) statement
terminator.
I want to check a condition in the TRY block. How can I control the flow of execution and
raise the error?
The answer is using THROW in the TRY block. Its severity level is 16, so it will terminate execution in the
TRY block. We know that when any statement in the TRY block terminates (encounters an error) then
immediately execution goes to the CATCH block. In fact the main idea is to THROW a custom error as in
this code:
END TRY
BEGIN CATCH
THROW
END CATCH
go
exec sptest
--result
As you can see, we handle the error step by step. In the next session we will complete this structure.
Does the CATCH part automatically rollback the statements within the TRY part?
This is the misconception that I sometimes hear. I explain this problem with a little example. After
executing the following code the table “dbo.Saeid” still exists. This demonstrates that the TRY/CATCH
block does not implement implicit transactions.
CREATE PROC sptest
AS
SET NOCOUNT ON;
BEGIN TRY
END TRY
BEGIN CATCH
THROW
END CATCH
go
-------------------------------------------
EXEC sptest;
go
SELECT *
FROM dbo.Saeid;
--result
The previous question showed that if we want to rollback entire statements in a try block, we need to
use explicit transactions in the TRY block. But the main question here is:
“Where is the right place to commit and rollback? “
It’s a complex discussion that I would not like to jump into in this article. But there is a simple template
that we can use for procedures (not triggers!).
This is that template:
CREATE PROC sptest
AS
SET NOCOUNT ON;
BEGIN TRY
SET XACT_ABORT ON; --set xact_abort option
BEGIN TRAN --begin transaction
SELECT 1/0
COMMIT TRAN --commit transaction
END TRY
BEGIN CATCH
THROW
END CATCH
go
EXEC sptest;
go
SELECT * FROM dbo.Hasani;
The elements of this structure are:
TRY block
o XACT_ABORT
o Begin transaction
Statements to try
o Commit transaction
CATCH block
o Check @@TRANCOUNT and rollback all transactions
o THROW
XACT_ABORT
In general it’s recommended to set the XACT_ABORT option to ON in our TRY/CATCH block in procedures.
By setting this option to ON if we want to roll back the transaction, any user defined transaction is rolled
back.
@@TRANCOUNT
We check this global variable to ensure there is no open transaction. If there is an open transaction it’s
time to execute rollback statements. This is a must in all CATCH blocks, even if you do not have any
transactions in that procedure. An alternative is to use XACT_STATE().
Conclusion
Introduction of the THROW statement is a big feat in Error Handling in SQL Server 2012. This statement
enables database developers to focus on accurate line numbers of the procedure code. This article
provided a simple and easy to use error handling mechanism with minimum complexity using SQL Server
2012. By the way, there are some more complex situations that I did not cover in this article. If you need
to dive deeper, you can see the articles in the See Also section.
Problem definition
First rule says that triggers are part of the invoking transaction (the transaction that fired them).
Yes, this is True and it means that at the beginning of the trigger, both values of @@trancount
and xact_state() are "1". So, if we use COMMIT or ROLLBACK inside trigger, their values will
change to "0", just after executing these statements.
Second strange rule is that if the transaction ended in the trigger, database raises an abortion
error. An example for this rule is executing COMMIT or ROLLBACK within the trigger.
-- declare variables
DECLARE @trancount CHAR(1) ,
@XACT_STATE CHAR(1) ;
END ;
GO
-- test time!
INSERT dbo.Test ( Name )
VALUES ( N'somthing' ) ;
Solution
Classic Solution
This solution uses the second rule to rollback trigger and raise an error. The following code shows this
mechanism:
-- create test table
IF OBJECT_ID('dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
CREATE TABLE dbo.Test
( Id INT IDENTITY PRIMARY KEY,
NAME NVARCHAR(128)
) ;
GO
-- create test trigger
CREATE TRIGGER dbo.TriggerForTest
ON dbo.Test
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
IF 1 = 1
BEGIN
-- rollback and end the transaction inside the trigger
ROLLBACK TRAN ;
-- raise an error
RAISERROR ( 'Error Message!', 16, 1) ;
END
END ;
GO
-- test time!
INSERT dbo.Test ( Name )
VALUES ( N'somthing' ) ;
This solution works fine until the RAISERROR is the last statement in trigger. If we have some statements
after RAISERROR, they will execute as shown in next code:
IF 1 = 1
BEGIN
-- rollback and end the transaction inside the trigger
ROLLBACK TRAN ;
-- raise an error
RAISERROR ( 'Error Message!', 16, 1) ;
END
This solution is applicable to SQL Server 2012 and above versions. THROW statement enhances the error
handling in triggers. It rollback the statements and throw an error message. Next code shows this
mechanism:
-- create test table
IF 1 = 1
-- just throw!
THROW 60000, 'Error Message!', 1 ;
END ;
GO
-- test time!
INSERT dbo.Test ( Name ) VALUES ( N'somthing' ) ;
SELECT * FROM dbo.Test ;
Conclusion
As I explained in former article, introducing the THROW statement was a revolutionary movement in
SQL Server 2012 Error Handling. This article proves it again, this time with triggers.
Custom Sort in Acyclic Digraph
Problem definition
This article is derived from this MSDN forum post. This article addresses the task of how to present
a Tree in a custom order. In fact, the article title could be pre-order tree traversal.
Vocabulary
A digraph is a graph, or a set of nodes connected by the edges, where the edges have a direction
associated with them.
Acyclic Graph
Acyclic Digraph
An acyclic digraph (directed acyclic graph), also known as a DAG, is a directed graph with no directed
cycles.
Topological Ordering
Every DAG has a topological ordering, an ordering of the vertices such that the starting endpoint of
every edge occurs earlier in the ordering than the ending endpoint of the edge.
Solution
The code below resolves the stated problem of how to present a non-topological ordering of a DAG (i.e.,
custom sorting an acyclic digraph). Executing the following script will create and populate a resultant
test table demonstrating the stated solution.
The image below shows the sample data used in this solution.
The solution is to produce paths that differ from topological ordering. In the following code, changing
the ORDER BY list in the ROW_NUMBER function changes the sort order, producing paths that differ
from the topological ordering.
WITH Subs
AS ( SELECT Childid ,
1 AS lvl ,
CAST(1 AS VARBINARY(MAX)) AS PathSort
FROM #test
WHERE Childid = @rootId
UNION ALL
SELECT C.Childid ,
P.lvl + 1 ,
P.PathSort + CAST(ROW_NUMBER() OVER (
PARTITION BY C.parentid ORDER BY C.Childid )AS BINARY(5))
FROM Subs AS P
JOIN #test AS C ON C.parentid = P.Childid
)
SELECT Childid ,
ROW_NUMBER() OVER ( ORDER BY PathSort ) AS CustomSort,
REPLICATE(' | ', lvl) + CAST(Childid AS NVARCHAR(100))
ChildInTree
FROM Subs
ORDER BY CustomSort;
String Functions
Patindex Case Sensitive Search
This article is a result of a quick research of the problem of using PATINDEX to search case insensitive
column using case sensitive search. The BOL does not show examples of how to implement particular
collation with the PATINDEX function. A relevant thread in MSDN Transact-SQL forum showed the
syntax.
Thanks to Jeff Moden I found that I can use Binary collation to be able to use ranges in the search.
So, if we want to split proper names such as JohnDoe, EdgarPo, etc. into two parts, we can use the
following code:
INSERT INTO @t
SELECT 'JohnDoe'
UNION ALL
SELECT 'AvramLincoln'
UNION ALL
SELECT Col
,COALESCE(STUFF(col, NULLIF(patindex('%[a-z][A-
Z]%', Col COLLATELatin1_General_BIN), 0) + 1, 0, ' '), Col) AS NewCol
FROM @t
Hope this article may help others looking for case sensitive search solution in SQL Server.
Remove Leading and Trailing Zeros
In this post I have consolidated few of the methods to remove leading and trailing zeroes in a string.
Here is an example:
-- NN - note, this method will only work if the data are clean
One of the commonly asked questions in Transact SQL Forum on MSDN is how to filter rows
containing bad characters. Also, often times these bad characters are not known, say, in one of
the recent posts the question was to filter all the rows where characters were greater than ASCII 127.
The first step towards solution is to realize that in order to quickly filter out something we may want to
know the list of allowed characters first.
I will now show several samples of how important is to know the "good characters" in order to filter the
"bad" ones.
Let's suppose we only want alpha-numeric characters to remain and everything else should be
considered bad rows.
For all our examples let's create the following table variable:
where a-z means a range of all letters from a to z, 0-9 means range of all numbers from 0 to 9 and ^
means everything which is not like the following characters.
The above code will return 2 last rows. The second row is returned because it contains a space character
which was not included in the list of allowed characters.
Now, what should we do if want to keep all the "normal" characters and only disallow characters which
are greater than ASCII 127? In this case, we may want to build the pattern in a loop.
SELECT *
FROM @TableWithBadRows;
SELECT *
FROM @TableWithBadRows
WHERE description LIKE '%[^A-Z0-9%]%';
WHILE @i < 47
BEGIN
SET @ch = CHAR(@i)
IF @ch = '_'
SET @pattern = @pattern + '[' + @ch + ']';
ELSE
IF @ch = '['
SET @pattern = @pattern + @ch + @ch;
ELSE
SET @pattern = @pattern + @ch;
SET @i = @i + 1;
END
SET @i = 58;
WHILE @i < 65
BEGIN
SET @ch = CHAR(@i)
IF @ch = '_'
SET @pattern = @pattern + '[' + @ch + ']';
ELSE
IF @ch = '['
SET @pattern = @pattern + @ch + @ch;
ELSE
SET @pattern = @pattern + @ch;
SET @i = @i + 1;
END
SELECT @pattern
SELECT *
FROM @TableWithBadRows
WHERE description LIKE '%[' + @pattern +']%'
As you can see from the second select statement, the CHAR(200) (È) is not being filtered by the a-z filter
as it is apparently considered a letter.
We may try adding binary collation to treat that letter as bad, e.g.
SELECT *
FROM @TableWithBadRows
WHERE description LIKE '%[^A-Za-z0-9% ]%' COLLATE Latin1_General_BIN;
This thread "Getting records with special characters" shows how to create a pattern when the bad
characters are in the special table and also which characters ([,^,-) we need to escape.
Conclusion
I have shown several examples of filtering bad rows using various patterns.
Random String
Introduction
In this article we are going to show several logistics on how to build a random string. This is very useful
for maintenance tasks like testing (Populate large tables with random values), generate random
password and so on...
If you have any other way of doing it, then you are most welcome to edit this article and give us your
insight :-)
Solutions
Let's examine several solutions. Those solutions came from forum's users, and we will try to put them
into perspective of advantages & disadvantages. We will close the article with conclusions and
recommendations. If you are just looking for the best solution then you can jump to the end.
Basic idea
1. Create random string using the function NEWID (), this will give us a random 36 characters
string.
2. Create a random number using the function NEWID, as the string length.
3. Cut the string using the function LEFT
Code
2. Using Clean NEWID as base string & NEWID to generate a random length.
Basic idea
1. Create random string using the function NEWID, this will give us a random 36 characters string.
2. Clean the dash character, this will give us a random 32 characters string.
3. Create a random number using the function NEWID, as the string length.
4. Cut the string using the function LEFT
Code
Basic idea
We can use an existing data, which is not random as a base string. then we use text manipulation
like "data scrambling", "data parsing", "random sorting" and so on, in order to get a "look like random
data".
* This idea can be improved significantly scale by using an existing random data table!
;WITH cte_1 AS(
SELECT ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS RN, t.name
FROM sys.tables AS t
CROSS JOIN sys.tables AS tt
),
cte_2 AS(
SELECT ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS RN, t.name
FROM sys.columns AS t
CROSS JOIN sys.columns AS tt
)
SELECT
cte_1.name + cte_2.name AS RandomString1,
REPLICATE(cte_1.name + cte_2.name,CASE WHEN ABS (CHECKSUM (NEWID ())) % 4 =
0 THEN 1 ELSE ABS(CHECKSUM (NEWID ())) % 4 + 1 END) AS RandomString2
FROM cte_1
INNER JOIN cte_2
ON cte_1.RN = cte_2.RN
In the example above we just used the tables name in the system as a base strings for manipulation. This
is only an example as this idea (using existing data) can be done in any way and on any tables that we
want.
The solution is base on existing data. as more manipulation as we do we can make this more "look like"
a random data.
Basic idea
We are using a UDF to create a single random string. the function get 2 parameters: (A) the maximum
length of the String (B) Do we need to create a string as long as the maximum or randomly length.
/******************************
* Version("2.0.0.0")
* FileVersion("2.0.0.0")
* WrittenBy("Ronen Ariely")
* WebSite("http://ariely.info/ ")
* Blog("http://ariely.info/Blog/tabid/83/language/en-US/Default.aspx ")
******************************/
CREATE function [dbo].[ArielyRandomStringFunc_Ver2.0.0](@NumberOfChar int,
@IsFixedLength bit = true)
returns nvarchar(MAX)
WITH EXECUTE AS CALLER
AS
begin
DECLARE @TotalNumberOfCharToReturn int
IF (@IsFixedLength = 1)
SET @TotalNumberOfCharToReturn = @NumberOfChar
ELSE
-- I am using my own random function
-- you can read more about the resone here:
-- Using side-effecting build in functions inside a UDF (your
function)
-- http://ariely.info/Blog/tabid/83/EntryId/121/Using-side-effecting-
build-in-functions-inside-a-UDF-your-function.aspx
SET @TotalNumberOfCharToReturn
= CONVERT(int,(AccessoriesDB.dbo.ArielyRandFunc() * @NumberOfChar) +1)
A: Relatively fast.
A: Full Random
A: No length limit
A: No characters limit
D: No filtering option for security
5. Selecting from characters list-> Using Loop to build a flexible string length
Basic idea
The basic idea is same as above, with the option for filtering characters, as we chose from a list. we are
choosing a random number in order to chose the character in that position on our list. We use a loop to
build the entire string.
Code
/******************************
* Version("1.0.0.0")
* FileVersion("1.0.0.0")
* WrittenBy("Ronen Ariely")
* WebSite("http://ariely.info/ ")
* Blog("http://ariely.info/Blog/tabid/83/language/en-US/Default.aspx ")
******************************/
CREATE function [dbo].[ArielyRandomStringFunc_Ver1.0.0](@NumberOfChar int,
@IsFixedLength bit = true)
returns nvarchar(MAX)
WITH EXECUTE AS CALLER
AS
begin
DECLARE @TotalNumberOfCharToReturn int
IF (@IsFixedLength = 1)
SET @TotalNumberOfCharToReturn = @NumberOfChar
ELSE
-- I am using my own random function
-- you can read more about the reasons here:
-- Using side-effecting build in functions inside a UDF (your
function)
-- http://ariely.info/Blog/tabid/83/EntryId/121/Using-side-effecting-
build-in-functions-inside-a-UDF-your-function.aspx
SET @TotalNumberOfCharToReturn
= CONVERT(int,(AccessoriesDB.dbo.ArielyRandFunc() * @NumberOfChar) +1)
-- We can just use select number and choose char(random number) to get
random char
-- It is faster
-- But this is in-secure and I preferred to choose from secured
characters list here
A: Relatively fast.
A: Full Random
A: No length limit
A: No characters limit
A: Filtering option for security
6. Building a fix length random string using NEWID and NCHAR->Cut randomly using
LEFT
Basic idea
Code
Basic idea
The option of using CLR function added in SQL 2005 and yet most DBAs do not use it. DBAs have to
internalize the extreme improvement and differences that can be (sometimes) by use CLR! While SQL
Server work well with SET of DATA, CLR work much better in manipulating strings (Split, Regular
expression...).
Code
http://social.technet.microsoft.com/wiki/contents/articles/21219.sql-server-create-random-string-
using-clr.aspx
* Links to other versions can be seen on the resources
A: VERY FAST.
A: Extremely flexible.
o No Length limit
o No characters limit
A: Filtering option for security
D: needed to enable the use of CLR (hopefully you have done it already!)
Without a doubt, CLR function is the best solution! If you can use it than chose it. Tests have
shown that this function can produce in less than 2 seconds what other functions have not been
able to produce in more than 20 minutes (The execution was terminated after 20 minutes). This
solution meat with any requirement.
It is highly recommended not to use solution without filtering mechanism! Several UNICODE
characters might be harmful in some situation. You can get more information about the
problematic CHAR ZERO For example, in this link [Hebrew].
If you need (A) fast query (B) flexible & unlimited length or (C) Filtering mechanism to choose
the characters that can be use, then use solution 5 or change a bit solution 4 to add filtering.
If you need (A) fast query and (B) short max length string and (C) you have to use all the
characters range, then you can use solution 6.
Sort Letters in a Phrase using T-SQL
Problem definition
This article comes up from this MSDN forum post. The problem is how can we sort the letters in a phrase
just using T-SQL? To clarify the question, for instance the desired result for CHICAGO must be ACCGHIO.
Introduction
Because SQL is a Declarative Language in Relational System, it does not have arrays. Table is a relational
variable that presents a relation, simply it is a Set that has no order. But if someone needs to do this sort
in SQL Server, for example, because of a need to sort and compare in a huge table, how can we handle it?
Solution
By using T-SQL, because it has additional features even beyond relational; there is a solution to solve this
problem. By the way, the first problem is how to assign array index to letters in a phrase?
One answer is to use spt_values helper table. Following sample code shows the functionality that will use
later.
DECLARE @String VARCHAR(MAX)
The following figure shows the result of the code. It shows the array index assigned per letter.
Now it’s possible to solve the main problem. Next script produces the sample data.
Limitations
Using this solution has two limitations that come from the spt_value helper table. These limits are:
1. Data Type
The spt_value return extra records for Unicode data types. So the data type cannot be Unicode such as
NVARCHAR.
2. Data Length
Dates Related
T-SQL: Date-Related Queries
In this article I plan to add various interesting date related queries. This article will expand when new
problems will present themselves in the Transact-SQL forum.
I want to start with this simple question that was posted today (May 31, 2013) - how to find today's
date day number from the beginning of the year.
This is my solution and a bit of explanation at the end
The @YearStart variable dynamically calculates the beginning of the year for any date based on the year
difference with any known date we use as anchor date.
However, there is much simpler solution as suggested by Gert-Jan Strick in the thread I referenced:
Today's Transact-SQL MSDN forum presented the following problem Change date parameters to find
data from previous month .
Date Computation
I was working on one of the financial projects on one of my own custom implementation for SQL Server.
I found dates calculations to be extremely important which is needed by most of the applications which
stand on today’s market, henceforth I thought of publishing an article on the dates topic. This will be
needed for almost all financial applications that stands on today’s market and will be extremely
important as it has wide range of applications in financial, Retails, etc. industries.
This article provides collection which will be extremely helpful for the programmers who are using SQL
Server for their projects.
select GETDATE()
Output:
2013-07-27 14:45:44.463
Finding Start Date and End Date of the Week
The following will give start date of the current week. Assume Current Date is 27 th July 2013.
2013-07-22 00:00:00.000
2013-07-26 14:51:36.1
This is assumed that beginning of the week is Monday and End is Friday, based on business day
This part is pretty tricky as present day can be between first or second half and also the month may
contain 28,29,30,31 days.
We will divide the date for 1-15 being first half, as used by most financial institutions and then based on
where date falls we compute the two weeks
The following code provides beginning and end dates for two weeks:
if MONTH(getdate()) <= 15
begin
end
else
begin
end
end
This will output 1-14 or 15-end of month as begin and end dates
The following query provides start and end date of current month:
The following query provides start and end date of current month:
2013-07-01 00:00:00.000
2013-09-30 00:00:00.000
This is quite complicate part. We need to find date falls under first half or second half of the year and no
direct methods available from sql server to do the same.
The following query provides start and end dates for half year:
2013-07-01 00:00:00.000
2013-12-01 00:00:00.000
Finding Start Date and End Date For Year
The following query finds start and end date for the current year:
2013-01-01 15:15:47.097
2013-12-31 15:15:47.113
SQL Server: How to Find the First Available Timeslot for Scheduling
In a scheduling application, it may be desirable to find the first available schedule time (timeslot) for a
new appointment. The new appointment must fit completely between existing appointments -without
overlap. As the schedule fills, new entries are assigned to the next first available schedule
time (timeslot). Alternatively, if desired, the first n available timeslots will be returned for selection.
In the sample data below, the Schedule table is pre-filled with a Start of Day record, and an End of Day
record. Normally, that information would be derived from a JOIN with a Calendar table.
A solution for SQL Server 2005 / SQL Server 2008 is provided below.
;WITH CTE
AS ( SELECT
*,
RowNumber = ROW_NUMBER() OVER( ORDER BY AppStart ASC )
FROM @MySchedule
)
SELECT TOP 3 ApptOptions = a.AppFinish
FROM CTE a
INNER JOIN CTE b
ON a.RowNumber = b.RowNumber - 1
WHERE datediff( minute, a.AppFinish, b.AppStart) >= @AppNeed
AppOptions
2007-01-11 10:15:00.000
2007-01-11 14:45:00.000
2007-01-11 18:30:00.000
Additional Resouces
Having a Calendar table is a very useful utility table that can benefit many data querying situations.
For this example, two additional columns (AppStart, AppFinish) can be added to the table to handle
situations where business hours are not the same for all days.
T-SQL: Group by Time Interval
A question was posted today in Transact-SQL forum "Counts by Time Interval " The thread
originator wanted to know how to find how many jobs were completed in each hour in a certain interval
for the current shift. The solution I implemented is based on the DATEPART function that allows to
get hour part of the datetime variable (or column).
Solution
FROM dbo.Jobs
This solution assumes, that @StartTime and @EndTime variables will be set for the current day interval
(otherwise we may want to add CAST(JobComplete AS DATE) into select list and GROUP BY list.
Now, this is a very straightforward problem. What if we need to solve slightly more complex problem of
grouping by every 15 (Nth) minutes? I discussed this problem before as a first problem in this blog post
"Interesting T-SQL problems ". Below is a solution from that blog post:
;With cte As
(Select DateAdd(minute, 15 * (DateDiff(minute, '20000101', SalesDateTime) /
15), '20000101') AsSalesDateTime,
SalesAmount
From @Sales)
Select SalesDateTime, Cast(Avg(SalesAmount) As decimal(12,2)) As AvgSalesAmount
From cte
Group By SalesDateTime;
Finally, a few notes on missing data possibility. If we want to display data for all times in the predefined
interval, even if we don't have data for particular hour, for example, we need to have a Calendar table
analogue first and LEFT JOIN from that table of all needed time intervals to our summary solution.
CHAPTER 9:
XML
Avoid T (space) while generating XML using FOR XML clause
The following code shows an example on how to avoid T (space) while generating XML using FOR XML
clause
Sample Data:
--you will find T(space),if you are not converting date column with proper
datetime style,
SELECT * FROM @Employee
FOR XML PATH('Employee')
<Employee>
<ID>1</ID>
<Name>Sathya</Name>
<DOJ>2013-06-08T08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>2</ID>
<Name>Madhu K Nair</Name>
<DOJ>2008-06-08T08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>3</ID>
<Name>Vidhyasagar</Name>
<DOJ>2008-06-08T08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>1</ID>
<Name>Sathya</Name>
<DOJ>2013-06-08 08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>2</ID>
<Name>Madhu K Nair</Name>
<DOJ>2008-06-08 08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>3</ID>
<Name>Vidhyasagar</Name>
<DOJ>2008-06-08 08:50:52.687</DOJ>
</Employee>
Generate XML with Same Node Names using FOR XML PATH
In this post we are going to see how we can generate XML in the below mentioned format from the
relational data.
<row>
<column>1</column>
<column>1</column>
</row>
<row>
<column>2</column>
<column>2</column>
</row>
Here is an example:
--Sample data
--If we mention same alias name for all columns, all column values will be
merged
**/
<row>
<column>1</column>
<column>1</column>
</row>
<row>
<column>2</column>
<column>2</column>
</row>
**/
Generate XML - Column Names with their Values as text() Enclosed
within their Column Name Tag
The most commonly used XML format is the following: (column names with their values as text()
enclosed within their column name tag)
Lets find out how to generate the following XML for table provided below:
<Employees>
<field Name="ID">1</field>
<field Name="Name">Sathya</field>
<field Name="Age">25</field>
<field Name="Sex">Male</field>
<field Name="ID">2</field>
<field Name="Name">Madhu K Nair</field>
<field Name="Age">30</field>
<field Name="Sex">Male</field>
<field Name="ID">3</field>
<field Name="Name">Vidhyasagar</field>
<field Name="Age">28</field>
<field Name="Sex">Male</field>
</Employees>
Here is an example:
SET @xmldata = (SELECT ID,Name,Age,Sex FROM @Employee FOR XML PATH (''))
SET @xmldata = (
SELECT ColumnName AS "@Name",
ColumnValue AS "text()"
FROM(
SELECT i.value('local-name(.)','varchar(100)') ColumnName,
i.value('.','varchar(100)') ColumnValue
FROM @xmldata.nodes('//*[text()]') x(i)) tmp
FOR XML PATH ('field'),root('Employees'))
SELECT @xmldata
SQL Server XML: Sorting Data in XML Fragments
Working with data sets made us all aware of the fact that a set has no order. XML documents are not
data sets, thus they always have a natural order.
Problem Definition
The original problem was to sort a XML document with two levels: Get double-sorted xml document
from xml-document
Approaches
A - Using T-SQL
Using T-SQL means, that we need to deconstruct our data by parsing the necessary values. We use
the nodes() method to extract the elements on the level we want to sort it. And we extract the order
criteria with the value() method to sort it with the ORDER BY clause. Finally we can reconstruct the
XML fragment by using FOR XML with the PATH mode .
Here is the trivial case. We are completely deconstructing a flat hierarchy and use only the data in T-SQL
to reconstruct the XML:
WITH Deconstructed AS (
SELECT Element.value('@name', 'NVARCHAR(255)') AS ElementName
FROM @Data.nodes('/element') [Elements] ( Element )
)
SELECT ElementName AS [@name]
FROM Deconstructed
ORDER BY ElementName
FOR XML PATH('element');
WITH Deconstructed AS (
SELECT Element.value('@name', 'NVARCHAR(255)') AS ElementName,
Element.query('.') AS ElementContent
FROM @Data.nodes('/element') [Elements] ( Element )
)
SELECT ElementContent AS '*'
FROM Deconstructed
ORDER BY ElementName
FOR XML PATH('');
B - Using XQuery
Using XQuery means that we use the order clause of a FLWOR statement .
SELECT Fragment.query('
for $element in /element
order by $element/@name ascending
return $element’)
FROM @Data.nodes('.') Fragment ( Fragment );
As the XQuery FLWOR statement already works on nodes, we already have a solution for the more
complex case:
SELECT Fragment.query('
for $element in /element
order by $element/@name ascending
return $element
')
FROM @Data.nodes('.') Fragment ( Fragment );
SELECT Levels.query('
for $level1 in /level1
order by $level1/@name ascending
return $level1
')
FROM @Data.nodes('.') Levels ( Levels );
Here is the result, the list is only sorted on the top level:
<level1 name="1">
<level2 name="a" />
</level1>
<level1 name="2">
<level2 name="c" />
<level2 name="b" />
</level1>
<level1 name="3">
<level2 name="f" />
<level2 name="e" />
<level2 name="d" />
</level1>
Here we already see that we need a kind of nested sort, because we have only sorted the outer levels.
In a FLWOR statement we can use complex return expressions, especially we can use further FLWOR
statements:
SELECT Levels.query('
for $level1 in /level1
order by $level1/@name ascending
return
<level1 name="{$level1/@name}">{
for $level2 in $level1/level2
order by $level2/@name ascending
return $level2
}</level1>
')
FROM @Data.nodes('.') Levels ( Levels );
<level1 name="1">
<level2 name="a" />
</level1>
<level1 name="2">
<level2 name="b" />
<level2 name="c" />
</level1>
<level1 name="3">
<level2 name="d" />
<level2 name="e" />
<level2 name="f" />
</level1>
Conclusion
Using the T-SQL approach means that we need to handle the conversion from and to XML to overcome
the barrier between XML and T-SQL. While this is only a small step, it simply means more code. And
more code is more complex per se.
The XQuery FLWOR expression on the other hand allows us to use a more compact notation. And this
kind of XQuery processing was exactly built for these kinds of manipulation. It is the better choice in our
case.
Terminology
http://www.w3.org/TR/xml-fragment.html#defn-fragment
http://www.validome.org/xml/validate/
http://en.wikipedia.org/wiki/FLWOR
http://www.w3.org/TR/xquery/#id-flwor-expressions
How to Extract Data in XML to Meet the Requirements of a Schema
Introduction
This article is based on a question posted on the TechNet Forum Brazil for SQL Server - XML - Mode
EXPLICIT, CDATA and will provide you a solution to a common problem; the formatting of a data query in
T-SQL into XML that adequately meets the conditions of a XML Schema (XSD) or a Document Type
Definition (DTD) .
This is one of the possible solutions related to this problem. If you know other options in T-SQL that
meet the needs of this problem, feel free to add your content to this article.
Problem
During my reading of the threads in the SQL forum, I found the following question that was discussed.
The question was: "I'm trying to generate XML using EXPLICIT mode because I need to use CDATA in
some fields. The problem is that an XML Schema requires that my XML have some AttributeName, such
as "g: AttributeName". And WITH XMLNAMESPACES is not compatible with the EXPLICIT mode of T-
SQL."
It is clear that the person who asked the question, even with some difficulty to ask, explains that his
need is to get the XML data in the following format:
The XML expected by the poster should result in something similar to this content:
This also occurs in environments with similar platforms, but to less extent. This need for data integration
between companies is very old. Even different departments/branches need to ensure that their shared
data is always updated. Today, SQL Server 2012 has the resources to handle this kinds of data processing
that we will present, but these same features can be obtained with greater depth through BizTalk
Server.
Diagnostic Steps
Once you diagnose the cause of the problem, we go to their proper resolution. There may be other
solutions as an alternative, but the one indicated at the end of this article answers the question posted
in the Forum in the simplest and most practical way possible.
Solution
To structure the solution of the problem, one must be clear about all conditions of the XML Schema
proponent of the question, even though it has not been submitted.
Despite the proponent of the question to be trying to get the desired XML format via a query T-SQL in
the EXPLICIT mode , this mode does not allow the condition of the XML Schema to the predecessor of
the "g" namespace in this way we will be presenting the solution with T-SQL using the RAW mode
query .
So to set the precedence of the "g" namespace URI, we set the table fields with this predecessor using
the standard XML Schema separator character.
Each line must have a tag called "item", set in RAW mode, so the new name for the "row" tag should be
"item".
To complete all the requirements stipulated by the proponent of the question, we define the ROOT
function that the root of the all XML has the defined name "xml" tag.
SELECT
CD_Product as 'g:ID',
NM_Product as 'g:NAME'
FROM dbo.test
The result is displayed as expected by the person asking the question (Figure 2):
Additional Information
If you want to know how to consume and validate the contents of an XML through XSD or DTD, using
the in VB.Net or C # programming language,
I recommend reading of Knowledge Bases(KB): 315533 and 318504 .
Credits
This article was inspired by writing articles:
Wiki: Templates For Converting a Forum Thread Into a New Wiki Article
Wiki: Technical Editing
Wiki: Best Practices for Source References and Quotes
Thanks Sandro, Naomi, and Peter for the constant guidance in your articles. This motivated me to create
this article!
To strengthen your knowledge about XML, XSD and DTD, I recommend reading of the following:
References
Read some advanced articles:
TechNet Library
Read the following topics:
Miscellaneous
T-SQL Script to update string NULL with default NULL
Problem
It is common to have nullable columns in the table, but if we populate those nullable columns with
string NULL instead of default NULL, there araises the problem.
If we populate nullable columns with string column, we cannot make use of NULL functions available in
SQL Server.
For Example:
USE [AdventureWorks2012]
GO
--Create test table with two columns to hold string & default NULL
CREATE TABLE Test_Null(Id INT IDENTITY(1,1),StringNull VARCHAR(10)
,DefaultNull VARCHAR(10))
INSERT Test_Null (StringNull) SELECT 'NULL'
INSERT Test_Null SELECT 'NULL',NULL
--Execute below two queries to find how "IS NULL" works with string & default
NULL
SELECT * FROM Test_Null WHERE StringNULL IS NULL
SELECT * FROM Test_Null WHERE DefaultNull IS NULL
--Execute below two queries to find how "ISNULL" works with string & default
NULL
SELECT ISNULL(StringNULL,0) StringNULL FROM Test_Null
SELECT ISNULL(DefaultNull,0) DefaultNull FROM Test_Null
Solution
USE [AdventureWorks2012]
GO
SET NOCOUNT ON
DECLARE @query NVARCHAR(MAX),
@table_count INT,
@column_count INT,
@tablename VARCHAR(100),
@Columnname VARCHAR(100),
@Schemaname VARCHAR(100) = 'HumanResources', --schema names to be passed
@i INT = 1,
@j INT = 1
DECLARE @MyTableVar TABLE(Number INT IDENTITY(1,1),
Table_list VARCHAR(200));
DECLARE @MyColumnVar TABLE(Number INT IDENTITY(1,1),
Column_list VARCHAR(200));
INSERT INTO @MyTableVar
SELECT name
FROM sys.tables
WHERE TYPE = 'U' AND SCHEMA_NAME(SCHEMA_ID) = @Schemaname
SELECT @table_count = MAX(Number) from @MyTableVar
WHILE @i <= @table_count
BEGIN
SELECT @tablename = Table_list FROM @MyTableVar WHERE Number = @i
INSERT @MyColumnVar
SELECT C.name
FROM SYS.columns C
INNER JOIN SYS.tables T ON T.object_id = C.object_id
INNER JOIN SYS.types TY ON TY.user_type_id =
C.user_type_id AND TY.system_type_id = C.system_type_id
WHERE SCHEMA_NAME(T.SCHEMA_ID) = @Schemaname
AND OBJECT_NAME(T.OBJECT_ID) = @tablename AND T.type = 'U'
AND C.is_nullable = 1
AND TY.system_type_id IN (167,175,231,239) --only character columns
ORDER BY C.column_id
SELECT @column_count = MAX(Number) FROM @MyColumnVar
WHILE @j <= @column_count
BEGIN
SELECT @Columnname = Column_list FROM @MyColumnVar WHERE Number = @j
SET @query = 'UPDATE ['+@Schemaname+'].['+@tablename+'] SET ['+@Columnname+']
= NULL WHERE ['+@Columnname +'] = ''NULL''' + CHAR(10) + 'GO'
SET @j = @j + 1
PRINT @query
--To execute the generated Update scripts
--EXEC (@query)
END
SET @i = @i + 1
END
Note :
i) Above code will generate UPDATE scripts for tables that belongs to the passed Schema names to the
variable@Schemaname
ii) Above code will generate UPDATE scripts only for character columns (VARCHAR, CHAR, NVARCHAR)
iii) Code is tested and working with SQL Server 2008, SQL Server 2012.
FIFO Inventory Problem - Cost of Goods Sold
In this article I am going to explain the FIFO (first in first out) algorithm for calculating cost of goods sold.
This is the real business problem I am working on now.
There are many articles on the Internet explaining concepts of Calculating Cost of Goods On Hand and
Cost of Goods Sold in the inventory calculation. I will give just a few of them and quote a bit of material
from these articles to provide a brief overview. I suggest readers of this article review the mentioned
articles or just do a Google search on the terms "FIFO Cost of Goods Inventory calculation".
Chapter 4 by Hugo Kornelis in the "SQL Server MVP Deep Dives" book (first book) talks a bit about
Running Total problem, so it may be useful to read this chapter as well.
There are several valuation methods, but for small businesses it is generally restricted to FIFO and
Moving Average.
In our application we have two methods of calculating inventory: RWAC (Running Weighted Average
Cost) and FIFO. The preferred method of the calculation can be set in the Inventory Preference form.
After we have briefly discussed the theory, I am going to talk about implementing the FIFO algorithm of
calculating the Cost of Goods in our software. Historically, our application had only simpler RWAC
method (and not even the true RWAC, but rather just Average cost method). Few years ago the
company management team decided that it's time to offer our clients a FIFO method of calculating Cost
of Goods On Hand and Cost of Goods Sold. My first task was to identify all places in our software where
we may need adjustments and my colleague was tasked with creating necessary T-SQL functions.
Each inventory item was defined by these attributes: department, category, item, invent_id, locatn_id.
These 5 columns are used to identify a single inventory item in its current location. Quantity and
unit_cost columns are used to identify each inventory movement. In case of Sales or Returns
(Trans_Type = 'S') the unit_cost is 0 and has to be calculated. Trans_Type can be one of the following: P -
purchase, A - adjustment, T - transfer and S - Sale (negative quantity) or Return (positive quantity). The
ref_no column in case of sales / returns provides a reference to the trans_no from transactions table.
Also the date_time column is important for our calculations. Other columns in the Inventory table are
used for other purposes and not relevant for the calculation of Cost of Goods on Hand or Cost of Goods
Sold.
So, as I said, the first implementation of the Cost of Goods on Hand calculation was written by my
colleague as a multi-statements table valued function that accepted many parameters (some of them
were optional) and the calculation method type (RWAC, FIFO or LIFO) and returned the result as a table.
I checked the date of the first implementation in our Source of Safe software and it is August 2010.
It was quickly determined that using the multi-statements table valued function in that way lead to a
very bad performance. Also, somehow the process of developing the functions (or procedures) to do
these calculations was turned into my hands. I tried to change these functions to inline table-valued
functions for each method separately (one for FIFO and one for RWAC, we decided to drop LIFO method
then) but yet the performance on these set based functions was really bad for the clients with
substantial inventory movements.
In addition to discussing the FIFO calculation problem in the forum's threads I also had private e-mail
exchange with Peter Larsson who eventually helped me to adapt his solution from the Set-based Speed
Phreakery: The FIFO Stock Inventory SQL Problem for our table's structure and Cost of Goods on
Hand problem.
I discussed this problem in many threads in Transact-SQL forum in MSDN. Here is one of the earliest
threads (from May 2011), where I found that inline CTE based solution when we needed to use the same
CTE multiple times, was significantly slower than using temp tables to hold intermediate calculations:
I just re-read that long thread. Essentially, I confirmed that using inline UDF to calculate Cost of Goods
on Hand for the selected inventory and using CROSS APPLY with that function was very slow compared
to getting the inventory to work with into a temporary table first and then apply the calculations as a
stored procedure.
It also started to become clear that using our current structure of the table and not having anything pre-
calculated will lead to bad performance as we need to re-calculate the cost every time from the very
beginning. About a year or so ago I proposed a plan to re-design our inventory table by adding few more
tables we will be updating at the same time as transaction occur. Unfortunately, we haven't proceed in
this direction yet and I don't know if we ever going to look into these ideas in order to make the
calculation process easier.
In this article I planned to discuss the Cost of Goods Sold calculations, so I will just give the current code
of the Cost of Goods on Hand FIFO procedure without too many explanations.
Current procedure to calculate Cost of Goods on Hand
--==========================================================
/* SP that returns total quantity and cost of goods on hand
by department, category, item, invent_id, and locatn_id,
using FIFO (first in/first out) method of cost valuation:
To retrieve the total (FIFO) cost of goods on hand
for all inventory by location, by department:
EXECUTE dbo.siriussp_CostOfGoodsOnHand_FIFO 1
AS
BEGIN
SET NOCOUNT ON;
WITH cteInventorySum
AS (SELECT department,
category,
item,
invent_ID,
locatn_ID,
SUM(quantity) AS TotalInventory,
MAX(date_time) AS LastDateTime
FROM #Inventory
GROUP BY department,
category,
item,
invent_ID,
locatn_ID),
cteReverseInSum
AS (/* Perform a rolling balance ( in reverse order ) through the
inventory movements in */
SELECT s.department,
s.category,
s.item,
s.invent_ID,
s.locatn_ID,
s.Fifo_Rank,
(SELECT SUM(i.quantity)
FROM #Inventory AS i
WHERE i.department = s.department
AND i.category = s.category
AND i.item = s.item
AND i.invent_id = s.invent_id
AND i.locatn_id = s.locatn_id
AND i.trans_Type IN ('P','A','T')
AND i.Fifo_Rank >= s.Fifo_Rank) AS RollingInventory,
SUM(s.Quantity) AS ThisInventory
FROM #Inventory AS s
WHERE s.Trans_Type IN ('P','A','T')
GROUP BY s.Department,
s.Category,
s.Item,
s.Invent_ID,
s.Locatn_ID,
s.Fifo_Rank),
cteWithLastTranDate
AS (SELECT w.Department,
w.Category,
w.Item,
w.Invent_ID,
w.Locatn_ID,
w.LastDateTime,
w.TotalInventory,
COALESCE(LastPartialInventory.Fifo_Rank,0)
AS Fifo_Rank,
COALESCE(LastPartialInventory.InventoryToUse,0)
AS InventoryToUse,
COALESCE(LastPartialInventory.RunningTotal,0)
AS RunningTotal,
w.TotalInventory
- COALESCE(LastPartialInventory.RunningTotal,0)
+COALESCE(LastPartialInventory.InventoryToUse,0) AS UseThisInventory
FROM cteInventorySum AS w
OUTER APPLY (SELECT TOP ( 1 ) z.Fifo_Rank,
z.ThisInventory AS Invent
oryToUse,
z.RollingInventory AS Runnin
gTotal
FROM cteReverseInSum AS z
WHERE z.Department = w.Department
AND z.Category = w.Category
AND z.Item = w.Item
AND z.Invent_ID = w.Invent_ID
AND z.Locatn_ID = w.Locatn_ID
AND z.RollingInventory >=
w.TotalInventory
ORDER BY z.Fifo_Rank DESC) AS LastPartialInvento
ry),
LastCost
AS (SELECT DISTINCT Cogs.department,
Cogs.category,
Cogs.item,
Cogs.invent_id,
LastCost.LastCost
FROM cteWithLastTranDate Cogs
CROSS APPLY
dbo.siriusfn_LastCostUpToDate(Cogs.department,Cogs.category,Cogs.item,Cogs.in
vent_id, Cogs.LastDateTime) LastCost
WHERE Cogs.UseThisInventory IS NULL
OR Cogs.UseThisInventory =
0 OR Cogs.TotalInventory IS NULL OR Cogs.TotalInventory = 0),
cteSource
AS (
SELECT y.Department,
y.Category,
y.Item,
y.Invent_ID,
y.Locatn_ID,
y.TotalInventory as QuantityOnHand,
SUM(CASE WHEN e.Fifo_Rank = y.Fifo_Rank
THEN y.UseThisInventory
ELSE e.Quantity END * Price.Unit_Cost) AS CostOfGoodsOnHand,
LastCost.LastCost
FROM cteWithLastTranDate AS y
LEFT JOIN #Inventory AS e ON e.Department = y.Department
AND e.Category = y.Category
AND e.Item = y.Item
AND e.Invent_ID = y.Invent_ID
AND e.Locatn_ID = y.Locatn_ID
AND e.Fifo_Rank >= y.Fifo_Rank
AND e.Trans_Type IN ('P', 'A', 'T')
LEFT JOIN LastCost
ON y.Department = LastCost.Department
AND y.Category = LastCost.Category
AND y.Item = LastCost.Item
AND y.Invent_ID = LastCost.Invent_ID
OUTER APPLY (
/* Find the Price of the item in */
SELECT TOP (1) p.Unit_Cost
FROM #Inventory AS p
WHERE p.Department = e.Department and
p.Category = e.Category and
p.Item = e.Item and
p.Invent_ID = e.Invent_ID and
p.Locatn_ID = e.Locatn_ID and
p.Fifo_Rank <= e.Fifo_Rank and
p.Trans_Type IN ('P', 'A', 'T')
ORDER BY p.Fifo_Rank DESC
) AS Price
GROUP BY y.Department,
y.Category,
y.Item,
y.Invent_ID,
y.Locatn_ID,
y.TotalInventory,
LastCost.LastCost)
SELECT Department,
Category,
Item,
Invent_ID,
Locatn_ID,
CONVERT(INT,QuantityOnHand) as QuantityOnHand,
COALESCE(CostOfGoodsOnHand,0) AS CostOfGoodsOnHand,
COALESCE(CASE
WHEN QuantityOnHand <> 0
AND CostOfGoodsOnHand <> 0 THEN CostOfGoodsOnHand /
QuantityOnHand
ELSE LastCost
END, 0) AS AverageCost
FROM cteSource
WHERE @bIncludeZeroes = 1
OR (@bIncludeZeroes = 0
AND CostOfGoodsOnHand <> 0)
ORDER BY Department,
Category,
Item,
Invent_ID,
Locatn_ID;
END
GO
/* Test Cases
CREATE TABLE [dbo].[#Inventory](
[pri_key] [int] IDENTITY(1,1) NOT NULL,
[ref_no] [numeric](17, 0) NOT NULL,
[locatn_id] [int] NOT NULL,
[date_time] [datetime] NOT NULL,
[fifo_rank] [bigint] NULL,
[department] [char](10) NOT NULL,
[category] [char](10) NOT NULL,
[item] [char](10) NOT NULL,
[invent_id] [int] NOT NULL,
[trans_type] [char](1) NOT NULL,
[quantity] [numeric](8, 2) NOT NULL,
[unit_cost] [money] NOT NULL
) ON [PRIMARY]
SET IDENTITY_INSERT [dbo].[#Inventory] ON;
BEGIN TRANSACTION;
INSERT INTO [dbo].[#Inventory]([pri_key], [ref_no], [locatn_id], [date_time],
[fifo_rank], [department], [category], [item], [invent_id], [trans_type],
[quantity], [unit_cost])
SELECT 774, 0, 1, '20120627 11:58:26.000', 1, N'RETAIL ', N'SUPPLIES ',
N'BUG_SPRAY ', 0, N'T', 10.00, 2.0000 UNION ALL
SELECT 775, 129005001, 1, '20120627 13:02:57.000', 2, N'RETAIL ',
N'SUPPLIES ', N'BUG_SPRAY ', 0, N'S', -9.00, 0.0000 UNION ALL
SELECT 778, 0, 1, '20120627 13:06:07.000', 3, N'RETAIL ', N'SUPPLIES ',
N'BUG_SPRAY ', 0, N'T', 10.00, 2.6667 UNION ALL
SELECT 779, 130005001, 1, '20120627 13:17:46.000', 4, N'RETAIL ',
N'SUPPLIES ', N'BUG_SPRAY ', 0, N'S', -7.00, 0.0000 UNION ALL
SELECT 780, 131005001, 1, '20120627 13:18:16.000', 5, N'RETAIL ',
N'SUPPLIES ', N'BUG_SPRAY ', 0, N'S', 3.00, 0.0000 UNION ALL
SELECT 772, 24, 3, '20120627 11:57:17.000', 1, N'RETAIL ', N'SUPPLIES ',
N'BUG_SPRAY ', 0, N'P', 20.00, 2.0000 UNION ALL
SELECT 773, 0, 3, '20120627 11:58:26.000', 2, N'RETAIL ', N'SUPPLIES ',
N'BUG_SPRAY ', 0, N'T', -10.00, 2.0000 UNION ALL
SELECT 776, 24, 3, '20120627 13:04:29.000', 3, N'RETAIL ', N'SUPPLIES ',
N'BUG_SPRAY ', 0, N'P', 20.00, 3.0000 UNION ALL
SELECT 777, 0, 3, '20120627 13:06:07.000', 4, N'RETAIL ', N'SUPPLIES ',
N'BUG_SPRAY ', 0, N'T', -10.00, 2.6667
COMMIT;
RAISERROR (N'[dbo].[#Inventory]: Insert Batch: 1.....Done!', 10,
1) WITH NOWAIT;
GO
You can see that this procedure is using #Inventory temporary table and that it also has fifo_rank
column which is not present in the i_invent table from the database. I am pre-selecting rows I may be
interested in into the temporary #Inventory table and creating fifo_rank column using ROW_NUMBER()
function to partition by the 5 columns that determine a single inventory item and order by date_time,
po_link columns. You can also see that this procedure references this function
siriusfn_LastCostUpToDate. This function calculates last cost of the item to date using iterative
approach - it first tries to calculate it for the specific invent_id (invent_id <> 0 is for the "matrix" items,
e.g. items that may come in different sizes or colors). If there are no rows for the specific invent_id it
tries to get the last cost for the item itself regardless of invent_id. If it is still unknown, it checks
purchase orders table (i_pchord) also first for the invent_id and then for the item itself.
UNION ALL
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM cteRcvdMatrix i
OUTER APPLY (
SELECT TOP 1 unit_cost
FROM dbo.i_invent ii
WHERE trans_type IN (
'P'
,'A'
,'T'
)
AND i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
WHERE i.LastCost IS NULL
)
,ctePOMatrix AS (
SELECT *
FROM cteRcvdItem
WHERE LastCost IS NOT NULL
UNION ALL
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM cteRcvdItem i
OUTER APPLY (
SELECT TOP (1) unit_cost
FROM dbo.i_pchord ii
WHERE i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND i.invent_id = ii.invent_id
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
WHERE i.LastCost IS NULL
)
,ctePOItem AS (
SELECT *
FROM ctePOMatrix
WHERE LastCost IS NOT NULL
UNION ALL
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM ctePOMatrix i
OUTER APPLY (
SELECT TOP (1) unit_cost
FROM dbo.i_pchord ii
WHERE i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
WHERE i.LastCost IS NULL
)
SELECT i.department
,i.category
,i.item
,i.invent_id
,coalesce(i.LastCost, 0) AS LastCost
FROM ctePOItem i
GO
/* Test Cases
set statistics io on
SELECT * FROM dbo.siriusfn_LastCost('RT34HANDW','058GLOVEL', '19599 ',
409) -- RT34HANDW 058GLOVEL 19599
SELECT * FROM dbo.siriusfn_LastCostUpToDate('RT34HANDW','058GLOVEL', '19
599 ', 409,'20040101')
-- select top (1) * from dbo.i_invent where invent_id = 409 and trans_type
in ('A','P','T') and quantity > 0 order by date_time desc
set statistics io off
*/
Now I am going to discuss the procedure I am using to calculate Cost of Goods Sold using FIFO method.
About a year ago I spent a lot of time creating two versions of the procedure - one for SQL Server 2005-
2008 and one for SQL Server 2012. I thought I tested these procedures extensively and had them
working great. Our testers also tested them in various scenarios (I hope). Turned out I had not tested
them well enough and they were failing in a really simple scenario. Also, our client found a more
complex scenario and was able to perform analysis of these procedures and showed their faults.
Therefore I needed to look at them again and fix the problems.
So, I looked at them recently and I had to admit, I could not really understand what I was doing in
them. I think if I had written this article then rather than now, it may have helped. So, by documenting
my line of thoughts now in creating this procedure and also accepting the revisions by other people, it
may help me (and others) to perfect this procedure in the future, or re-design it again if needed.
The scenario that my colleague found failing in the last implementation of the procedure was the
following:
So, I decided I am going to try to re-write this procedure again rather than trying to figure out what was
that procedure doing and where the bug may be. I also found the following thread in MSDN SQL FIFO
Query which I already used in my prior attempts to solve the FIFO Cost of Goods Sold problem. This time
I concentrated on the Peter Larsson's solution in that thread (SwePeso).
In the procedure, that is invoked before the FIFO Cost of Goods Sold procedure is called, I am selecting
inventory items for the user's selections (say, for the Profit and Loss report the user can select particular
department (or department and category), may select specific vendor and also selects a date range
interval). So, I select rows into #Inventory temp table up to the end date of the selected dates interval. I
again add FIFO_RANK column and also for simplicity I add InvNo numerical column using DENSE_RANK()
function with order by Department, Category, Item, Invent_ID, Locatn_ID. This is done in order to use a
single integer column to identify each inventory item rather than 5 columns. In my calculations I am also
using dbo.Numbers table that has a single number column. In our database that table contains numbers
from ~-100K to 100K.
The idea of the new design of this procedure is to calculate the starting point of the inventory in one
step (up to Start Date - dtStart parameter) using Peter's idea and then process each individual sale or
return (and negative quantity transfers) within the selected date intervals. The final result should have
all sales and returns in the selected period (quantity and unit_cost).
So, I decided to introduce yet another temporary table I called #MovingInventory. In this table I have
InvNo column (this is artificial Item Id for each inventory item in a location I created in the pre-step),
fifo_rank column, quantity - the same quantity as in the #Inventory, CurrentQuantity (this column
should reflect the current remaining quantity), Removed (quantity removed) and Returned (quantity
returned). If we are to change our current Inventory process, we may create this table as a permanent
table in the database and update it on each inventory movement. We can also create InventorySales
table. Using these tables will significantly simplify the current calculation process.
So, we start with populating this new #MovingInventory temporary table with all positive additions to
the inventory with their unit_cost. I set CurrentQuantity to quantity and Returned and Removed to 0.
I have two more temporary tables used in this procedure: #Sales - this table will be used to generate our
final result and it will contain all sales and returns in the specified date range with the quantity sold
(returned) and unit cost used.
I also have #Removed table. I could have used table variable here instead but I recall I had some
problems with the table variable before in my prior version of that procedure so I decided to use
temporary table again. This table will be used to hold items removed (or returned) on each iteration and
it will be cleaned (truncated) on each iteration.
Here is the definition of these 2 temporary tables at the top of the procedure:
So, my first step is to calculate prior inventory in one step. Here is the code I use for this:
WHILE (@@FETCH_STATUS = 0)
BEGIN
SELECT @fifo_rank = MAX(fifo_rank)
,@Removed = - 1 * SUM(quantity)
FROM #Inventory
WHERE date_time < @dtStart
AND (
trans_type = 'S'
OR quantity < 0
)
AND InvNo = @InvNo;
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (CAST(s.Quantity AS INT)) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE M
SET Removed = R.Removed
,CurrentQuantity = M.CurrentQuantity - R.Removed
FROM #MovingInventory M
INNER JOIN cteRemoved R ON M.fifo_rank = R.fifo_rank
WHERE M.InvNo = @InvNo;
-- We can also check if Removed = @Removed (if less, we have
negative inventory - unlikely situation)
END
Here I am attempting to calculate our current working inventory in one step. I get the total sold quantity
and last date (fifo_rank) when it was sold prior to dtStart and then distribute that sold quantity among
all prior additions into inventory.
Here I am not considering situations when somehow we already sold more than we had in the inventory
originally or when we returned more than sold (so total quantity will be greater than 0). To be honest, I
am not 100% sure how to treat these situations, so I assume that possibility of them occurring is very
low.
Once we got the inventory up to the starting date (dtStart) I am ready to process each individual sale or
return. Here is how I do it for Sales and negative transfers:
WHILE (@@FETCH_STATUS = 0)
BEGIN
IF @quantity < 0 -- Sale or transfer
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Sale or transfer with quantity = ' + CAST(-
1 * @quantity ASVARCHAR(20))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.CurrentQuantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.CurrentQuantity) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND s.CurrentQuantity > 0
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,CurrentQuantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,CurrentQuantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity - R.Removed
,Removed = I.Removed + R.Removed
OUTPUT Inserted.unit_cost
,Inserted.Removed - deleted.Removed
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteRemoved R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Current Moving Inventory after Sale or Return'
,10
,1
)
WITH NOWAIT
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,Removed
FROM #Removed;
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
So, for each sale (or negative transfer) I use the same idea as in calculating starting inventory. I remove
the sold quantity distributing it among rows where current quantity > 0 ordering by date_time
(fifo_rank) column. I then update the #MovingInventory table (current quantity and Removed columns)
and I output results using OUTPUT keyword for UPDATE into #Removed table. In addition, I populate
#Sales table if the Trans_Type is 'S' (sales) to be used in the final select statement.
I also try to consider situations when we sold (or moved out) more than we have in the inventory. In this
case we're using Last Cost for the item.
Here lies another problem not currently considered - if we have the negative quantity balance, we need
to keep decrementing that difference after we receive that item. This is not currently done in my
procedure - so we may get incorrect Cost of Goods Sold in such scenarios. I may need to think more how
to handle this problem.
For the returns I am using a similar process to what I use for Sales, but I try to return back what I've
already removed in the opposite direction (e.g. last removed - first returned). So, this is how I handle
returns:
WITH cteSource
AS (
SELECT TOP (@Returned) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.Removed - s.Returned) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND (s.Removed - s.Returned) > 0
ORDER BY s.fifo_rank DESC -- returns in the LIFO order
)
,cteReturned
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Returned
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity + R.Returned
,Returned = I.Returned + R.Returned
OUTPUT Inserted.unit_cost
,Inserted.Returned - deleted.Returned
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteReturned R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Result after return'
,10
,1
)
WITH NOWAIT;
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,(- 1) * Removed
FROM #Removed;-- handle returns
-- Need to check for situations when we return what we
didn't have in the inventory before
IF @Debug = 1
BEGIN
SELECT *
FROM #Sales;
RAISERROR (
'Current Sales after return'
,10
,1
)
WITH NOWAIT;
END
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
Here again if we returned back more than we originally removed, I am returning using the last known
cost for the item.
Now I will give you the whole procedure code and hopefully you will see my logic. I also will appreciate
comments or code corrections as this is still a work in progress and hasn't been tested extensively yet.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(53 AS NUMERIC(17, 0))
,1
,CAST(0x0000A20000FF6D74 AS DATETIME)
,1
,1
,N'RETAIL '
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'P'
,CAST(40.00 AS NUMERIC(8, 2))
,10.0000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(53 AS NUMERIC(17, 0))
,1
,CAST(0x0000A20000FF6D74 AS DATETIME)
,2
,1
,N'RETAIL '
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'P'
,CAST(40.00 AS NUMERIC(8, 2))
,5.0000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(136005001 AS NUMERIC(17, 0))
,1
,CAST(0x0000A200011967D8 AS DATETIME)
,3
,1
,N'RETAIL '
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'S'
,CAST(- 50.00 AS NUMERIC(8, 2))
,0.0000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(54 AS NUMERIC(17, 0))
,1
,CAST(0x0000A200011967DA AS DATETIME)
,4
,1
,N'RETAIL '
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'P'
,CAST(40.00 AS NUMERIC(8, 2))
,7.5000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(136005002 AS NUMERIC(17, 0))
,1
,CAST(0x0000A200011967DE AS DATETIME)
,5
,1
,N'RETAIL '
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'S'
,CAST(- 50.00 AS NUMERIC(8, 2))
,0.0000
)
GO
IF NOT EXISTS (
SELECT *
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_NAME = 'siriussp_CostOfGoodsSold_FIFO'
AND ROUTINE_TYPE = 'PROCEDURE'
)
EXECUTE
('CREATE PROCEDURE dbo.siriussp_CostOfGoodsSold_FIFO AS SET NOCOUNT ON;');
GO
ALTER PROCEDURE dbo.siriussp_CostOfGoodsSold_FIFO (
@dtStart DATETIME
,@Debug BIT = 0
)
--=============================================================
/* SP that returns total quantity and cost of goods sold
by department, category, item, invent_id, and locatn_id,
using FIFO (First IN, First OUT) method of cost valuation.
Modified on 07/10/2012
Modified on 07/19/2013 - 7/26/2013
--=============================================================
*/
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS (
SELECT NAME
FROM TempDB.sys.sysindexes
WHERE NAME = 'idx_Inventory_fifo_rank'
)
CREATE INDEX idx_Inventory_fifo_rank ON #Inventory (
InvNo
,fifo_rank
);
OPEN curMainProcess;
FETCH NEXT
FROM curMainProcess
INTO @InvNo;
WHILE (@@FETCH_STATUS = 0)
BEGIN
SELECT @fifo_rank = MAX(fifo_rank)
,@Removed = - 1 * SUM(quantity)
FROM #Inventory
WHERE date_time < @dtStart
AND (
trans_type = 'S'
OR quantity < 0
)
AND InvNo = @InvNo;
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (CAST(s.Quantity AS INT)) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE M
SET Removed = R.Removed
,CurrentQuantity = M.CurrentQuantity - R.Removed
FROM #MovingInventory M
INNER JOIN cteRemoved R ON M.fifo_rank = R.fifo_rank
WHERE M.InvNo = @InvNo;
-- We can also check if Removed = @Removed (if less, we have
negative inventory - unlikely situation)
END
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory
WHERE InvNo = @InvNo;
RAISERROR (
'Done with the prior inventory - starting checking Sales
we''re interested in'
,10
,1
)
WITH NOWAIT;
END
OPEN curProcess
FETCH NEXT
FROM curProcess
INTO @InvNo
,@ref_no
,@date_time
,@fifo_rank
,@quantity
,@unit_cost
,@trans_type
,@department
,@category
,@item
,@invent_id
,@locatn_id
WHILE (@@FETCH_STATUS = 0)
BEGIN
IF @quantity < 0 -- Sale or transfer
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Sale or transfer with quantity = ' + CAST(-
1 * @quantity ASVARCHAR(20))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.CurrentQuantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.CurrentQuantity) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND s.CurrentQuantity > 0
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,CurrentQuantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,CurrentQuantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity - R.Removed
,Removed = I.Removed + R.Removed
OUTPUT Inserted.unit_cost
,Inserted.Removed - deleted.Removed
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteRemoved R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Current Moving Inventory after Sale or Return'
,10
,1
)
WITH NOWAIT
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,Removed
FROM #Removed;
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
IF @Debug = 1
BEGIN
SET @Message = 'Last Cost =
' + CAST(@LastCost AS VARCHAR(10))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
SELECT *
FROM #Sales
RAISERROR (
'Currently in #Sales'
,10
,1
)
WITH NOWAIT;
END
END
END
ELSE -- Returns
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Return with quantity =
' + CAST(@quantity AS VARCHAR(20));
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SELECT @Returned = @quantity;
WITH cteSource
AS (
SELECT TOP (@Returned) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.Removed - s.Returned) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND (s.Removed - s.Returned) > 0
ORDER BY s.fifo_rank DESC -- returns in the LIFO order
)
,cteReturned
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Returned
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity + R.Returned
,Returned = I.Returned + R.Returned
OUTPUT Inserted.unit_cost
,Inserted.Returned - deleted.Returned
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteReturned R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Result after return'
,10
,1
)
WITH NOWAIT;
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,(- 1) * Removed
FROM #Removed;-- handle returns
-- Need to check for situations when we return what
we didn't have in the inventory before
IF @Debug = 1
BEGIN
SELECT *
FROM #Sales;
RAISERROR (
'Current Sales after return'
,10
,1
)
WITH NOWAIT;
END
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
FETCH NEXT
FROM curProcess
INTO @InvNo
,@ref_no
,@date_time
,@fifo_rank
,@quantity
,@unit_cost
,@trans_type
,@department
,@category
,@item
,@invent_id
,@locatn_id
END -- while
CLOSE curProcess
DEALLOCATE curProcess
FETCH NEXT
FROM curMainProcess
INTO @InvNo
END -- while
CLOSE curMainProcess
DEALLOCATE curMainProcess
IF @Debug = 1
BEGIN
SET @Elapsed = datediff(second, @StartTime, CURRENT_TIMESTAMP);
PRINT ' Finished with the creation of #Sales tables using cursor in ' +
cast(@Elapsed AS VARCHAR(30)) + ' seconds';
END
SELECT S.trans_no
,S.department
,S.category
,S.item
,S.invent_id
,S.locatn_id
,SUM(S.quantity) AS QuantitySold
,CAST(SUM(S.quantity * S.unit_cost) AS MONEY) AS CostOfGoodsSold
FROM #Sales S
GROUP BY S.trans_no
,S.department
,S.category
,S.item
,S.invent_id
,S.locatn_id;
IF @Debug = 1
BEGIN
SET @Elapsed = datediff(second, @StartTime, CURRENT_TIMESTAMP);
RETURN;
GO
/* Test Cases
IF OBJECT_ID('TempDB..#Inventory',N'U') IS NOT NULL DROP TABLE #Inventory;
CREATE TABLE [dbo].[#Inventory](
[InvNo] [int] NOT NULL,
[ref_no] [numeric](17, 0) NOT NULL,
[locatn_id] [int] NOT NULL,
[date_time] [datetime] NOT NULL,
[fifo_rank] [bigint] NULL,
[department] [char](10) NOT NULL,
[category] [char](10) NOT NULL,
[item] [char](10) NOT NULL,
[invent_id] [int] NOT NULL,
[trans_type] [char](1) NOT NULL,
[quantity] [numeric](8, 2) NOT NULL,
[unit_cost] [money] NOT NULL
)
;with cte as (SELECT N'25' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29
16:48:39.000' AS [date_time], N'1' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS
[invent_id], N'P' AS [trans_type], N'100.00' AS [quantity], N'1.00' AS
[unit_cost] UNION ALL
SELECT N'133005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29
17:00:13.000' AS [date_time], N'2' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-90.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'25' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29 17:26:47.000' AS
[date_time], N'3' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'100.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'135005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29
17:28:19.000' AS [date_time], N'4' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'10.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27 11:58:26.000' AS
[date_time], N'1' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'10.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'129005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27
13:02:57.000' AS [date_time], N'2' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-9.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27 13:06:07.000' AS
[date_time], N'3' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'10.00' AS [quantity], N'2.6667' AS [unit_cost] UNION ALL
SELECT N'130005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27
13:17:46.000' AS [date_time], N'4' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-7.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'131005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27
13:18:16.000' AS [date_time], N'5' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'3.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'24' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 11:57:17.000' AS
[date_time], N'1' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'20.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 11:58:26.000' AS
[date_time], N'2' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-10.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'24' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 13:04:29.000' AS
[date_time], N'3' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'20.00' AS [quantity], N'3.00' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 13:06:07.000' AS
[date_time], N'4' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-10.00' AS [quantity], N'2.6667' AS [unit_cost] UNION ALL
SELECT N'4' AS [ref_no], N'1' AS [locatn_id], N'2011-04-03 18:34:44.000' AS
[date_time], N'1' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'24.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'11005001' AS [ref_no], N'1' AS [locatn_id], N'2011-04-07
09:57:51.000' AS [date_time], N'2' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'33005001' AS [ref_no], N'1' AS [locatn_id], N'2011-04-07
10:04:39.000' AS [date_time], N'3' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'103005001' AS [ref_no], N'1' AS [locatn_id], N'2011-07-06
17:55:17.000' AS [date_time], N'4' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'108005001' AS [ref_no], N'1' AS [locatn_id], N'2011-07-06
17:55:47.000' AS [date_time], N'5' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'115005001' AS [ref_no], N'1' AS [locatn_id], N'2011-08-01
17:47:11.000' AS [date_time], N'6' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'41005001' AS [ref_no], N'1' AS [locatn_id], N'2011-09-04
11:24:03.000' AS [date_time], N'7' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-2.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'48005001' AS [ref_no], N'1' AS [locatn_id], N'2011-09-04
11:38:31.000' AS [date_time], N'8' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-3.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'65005001' AS [ref_no], N'1' AS [locatn_id], N'2011-09-04
11:59:59.000' AS [date_time], N'9' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'1' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:02:19.000' AS
[date_time], N'10' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'A' AS [trans_type], N'5.00' AS [quantity], N'0.75' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:09:46.000' AS
[date_time], N'11' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'A' AS [trans_type], N'5.00' AS [quantity], N'0.10' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:15:05.000' AS
[date_time], N'12' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'T' AS [trans_type], N'5.00' AS [quantity], N'0.5469' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:15:47.000' AS
[date_time], N'13' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'T' AS [trans_type], N'5.00' AS [quantity], N'0.5469' AS
[unit_cost] UNION ALL
SELECT N'125005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:00:26.000' AS [date_time], N'14' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'-10.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'126005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:01:05.000' AS [date_time], N'15' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'5.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'127005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:02:07.000' AS [date_time], N'16' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'-50.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'128005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:02:51.000' AS [date_time], N'17' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'30.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'5' AS [ref_no], N'3' AS [locatn_id], N'2011-04-03 16:41:21.000' AS
[date_time], N'1' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'60.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'1' AS [ref_no], N'3' AS [locatn_id], N'2011-04-03 17:46:45.000' AS
[date_time], N'2' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'A'
AS [trans_type], N'-2.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'4' AS [ref_no], N'3' AS [locatn_id], N'2011-04-03 18:34:44.000' AS
[date_time], N'3' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-24.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'23' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:00:58.000' AS
[date_time], N'4' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'10.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'23' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:04:59.000' AS
[date_time], N'5' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'20.00' AS [quantity], N'0.10' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:15:05.000' AS
[date_time], N'6' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-5.00' AS [quantity], N'0.5469' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:15:47.000' AS
[date_time], N'7' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-5.00' AS [quantity], N'0.5469' AS [unit_cost] )
insert #Inventory ([ref_no], [locatn_id], [date_time], [fifo_rank], [InvNo],
[department], [category], [item], [invent_id], [trans_type], [quantity],
[unit_cost])
SELECT [ref_no], [locatn_id], [date_time], [fifo_rank], [InvNo],
[department], [category], [item], [invent_id], [trans_type], [quantity],
[unit_cost]
from cte
--CREATE INDEX idx_Inventory_fifo_rank ON #Inventory (InvNo, fifo_rank)
At the top of the script code I provided #Inventory table for the original failing scenario in order to
confirm that it works correctly with the new code. I also have a scenario I tested originally in the
comments after the stored procedure.
Summary
In this article I described the process of working on a complex problem of Calculating Cost of Goods Sold
using FIFO method and gave my current procedure code. I also showed potential problems and flaws in
that code. I will appreciate comments and ideas of improving this algorithm.
T-SQL: Gaps and Islands Problem
This article will consider a simple classical Gaps & Islands problem asked recently in Transact-SQL
Forum at MSDN with non original title "Query Help" .
Problem Definition
The thread originator was kind enough to provide DDL of the table and some data to describe the
task:
Create table T1
(Id int identity primary key,
VoucherNo varchar(4),
TransNo varchar(10)
)
Solution
As mentioned, this is a common problem in Transact SQL and it was described by Itzik Ben
Gan here or by Plamen Ratchev in this easy to understand blog post
Refactoring Ranges . Knowing main idea of the solution it is easy to provide it assuming that all
voucher numbers come in the following format (letter V following by the 3 digit number):
;WITH cte
AS (
SELECT *
,CAST(SUBSTRING(VoucherNo, 2, 3) AS INT) - ROW_NUMBER() OVER (
ORDER BY VoucherNo
) AS Grp
FROM T1
)
SELECT TransNo
,min(VoucherNo) AS FirstVoucherNo
,max(VoucherNo) AS LastVoucherNo
,count(*) AS Quantity
FROM cte
GROUP BY TransNo
,Grp
So, the idea of this solution is to group consecutive ranges first using ROW_NUMBER() function and
then apply aggregate functions based on that group identifier.
Note, it is easy to modify this query to work with different formats of the voucher number (say, some
combinations of letters following by any length number). This article is concentrating on the problem
posted by the thread originator and is solving it for the particular voucher number format. You may
want to see some modifications of my solution suggested by Ronen Ariely in the original thread.
Crazy TSQL Queries play time
Background
Most of the articles in WIKI try to bring us tutorials on a specific topic or best solution for a specific
problem. This post is different! It has nothing to do with Optimization, Query's cost or Best solution
(getting the best query) or tutorial, but instead, it is all about crazy queries for getting most basic "build-
in feature" (action or function for example) without using the "build-in feature".
The idea for this post came from lots of questions we can find in forums, and it looks like they do not
have any reason to be asked in the first place (for example this question from MSDN SQL Hebrew
forum). These questions most likely came from Job Interviews, courses, exams, and riddles. For
example: "how can we build a UNION query using JOIN", "how can we build a JOIN operation without
using the JOIN".
While none of these problems can be used in production server, it is a great way to make sure that we
really understand the operation/function we are trying to replace and those that we use for the replace.
Please feel free to add any idea, crazy as it is, as long as it requires the ability and understanding of the
feature you are writing about :-)
Learning about "UNION" is simple, learning about "JOIN" can be done in one hour, but how many of us
really understand the meaning, and able to convert "JOIN" to "UNION" and vice versa?
/******************************************** DDL+DML */
CREATE TABLE invoices (custId int,docNo int,docSum smallmoney)
CREATE TABLE creditNotes (custId int,docNo int,docSum smallmoney)
GO
SELECT
COALESCE(I.custId, C.custId) as custId
,COALESCE(I.docNo, C.docNo) as docNo
,COALESCE(I.docSum, C.docSum) as docSum
from invoices I
FULL OUTER JOIN creditNotes C ON 1=0
where I.custId = 1234 or C.custId = 1234
GO
/******************************************** DDL+DML */
CREATE TABLE UsersTbl (UserId int, Name nvarchar(100))
CREATE TABLE NotesTbl (UserId int,DocContent nvarchar(100))
GO
select
N.UserId NUserId,N.DocContent NDocContent,N.UserId
UUserId,(select Name from UsersTbl U whereU.UserId = N.UserId) UName
from NotesTbl N
where N.UserId in (select UserId from UsersTbl)
GO
select
N.UserId NUserId,N.DocContent NDocContent,N.UserId
UUserId,(select Name from UsersTbl U whereU.UserId = N.UserId) UName
from NotesTbl N
where N.UserId in (select UserId from UsersTbl)
UNION ALL
select NULL,NULL,UserId,Name
from UsersTbl
where UserId not in (select UserId from NotesTbl)
GO
* We can use the above queries and UNION to get both LEFT JOIN and RIGHT JOIN result set.
-- using our "LEFT JOIN" query without the filter on first result set
select
N.UserId NUserId,N.DocContent NDocContent,(select U.UserId from UsersTbl
U where U.UserId = N.UserId) UUserId,(select Name from UsersTbl U where U.UserId
= N.UserId) UName
from NotesTbl N
UNION ALL
select NULL,NULL,UserId,Name
from UsersTbl
where UserId not in (select UserId from NotesTbl)
GO
.
Playing with NULL
What is so confusing about NULL that make it a great subject for debates?
NULL is not equal NULL
That's make it a great playground for us.
Let's start with simple example. The function ISNULL replaces the first parameter with specified
replacement value, if it is NULL. The function COALESCE returns the value of the first expression in a list,
that initially does not evaluate to NULL.
select COALESCE(@QQ01,@QQ02,@QQ03,@QQ04)
select ISNULL(@QQ01,ISNULL(@QQ02,ISNULL(@QQ03,@QQ04)))
GO
There are lot of questions about the difference between "Cursor" and "While Loop". This is a
fundamental mistake to compare them at all. It's like comparing a car and a boat. We use the car moving
on land, and we use a boat to travel at sea. I would not recommend anyone to try the opposite. That
could be another playground for us here.
/******************************************** DDL+DML */
CREATE TABLE CursorAndLoopTbl(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Txt NVARCHAR(100)
)
GO
FETCH NEXT
FROM MyCursor
INTO @MyVar
END
CLOSE MyCursor
GO
DEALLOCATE MyCursor
GO
-- Using Loop
DECLARE @Counter INT = 1
DECLARE @RowNum INT = (SELECT COUNT(*) FROM CursorAndLoopTbl)
DECLARE @MyVar as NVARCHAR(100) = (select Txt from CursorAndLoopTbl where ID =
1)
SET @Counter += 1
SELECT @MyVar = (select Txt from CursorAndLoopTbl where ID = @Counter)
END
GO
DROP TABLE CursorAndLoopTbl
GO
* The idea for this post came from the question here (Hebrew):
http://social.technet.microsoft.com/Forums/he-IL/03fa90e1-1a2a-4756-8ca3-44ac3b015cf1/-
?forum=sqlhe
There are dozens of similar questions online :-)
* Cursor
http://technet.microsoft.com/en-us/library/ms181441.aspx
* WHILE
http://technet.microsoft.com/en-us/library/ms178642.aspx
* I highly recommend to check this link if you think about comparing "Cursor" and "While Loop"
http://ariely.info/Blog/tabid/83/EntryId/132/SQL-Server-cursor-loop.aspx
CHAPTER 11:
CLR
RegEx Class
Slightly boring class for doing some regex...
I've stored my notes on my pWord program which I also use for tracking passwords and it will always be
open source www.sourceforge.net/projects/pword .
using System;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;
using System.Data;
using System.Data.SqlTypes;
using System.Text;
namespace RegEx {
public class RX {
[SqlFunction]
public static bool IsMatch(String input, String pattern) {
// Regex.Match()
MatchCollection mc1 = Regex.Matches(input, pattern);
if (mc1.Count > 0)
return true;
else
return false;
}
[SqlFunction]
public static int GetMatchCount(String input, String
pattern) {
// Regex.Match()
MatchCollection mc1 = Regex.Matches(input, pattern);
return mc1.Count;
}
[SqlFunction]
public static String GetMatch(String input, String pattern)
{
MatchCollection mc1 = Regex.Matches(input, pattern,
RegexOptions.Compiled);
String output = "";
if (mc1.Count > 0)
{
foreach (Match m1 in mc1)
{
output = m1.ToString();
// only find the first occurrence ;)
break;
}
return output;
}
else
return "";
}
[SqlFunction]
public static String GetAllMatches(String input, String pattern)
{
MatchCollection mc1 = Regex.Matches(input, pattern,
RegexOptions.Compiled);
StringBuilder output = new StringBuilder();
output.Append("");
if (mc1.Count > 0)
{
foreach (Match m1 in mc1)
{
output.Append( m1.ToString());
// only find the first occurrence ;)
}
return output.ToString();
}
else
return "";
}
}
}
Check out the SQL Script for adding UDFs into SQL Server 2012.
SQL Server Resource Re-Balancing in Failover Cluster
The poster asked how to automatically adjust SQL Server's max server memory setting following a
cluster fail-over – see here . I provided the following script with suggestions for how it could be
tailored for their environment.
USE [master]
GO
USE [master]
GO
SET QUOTED_IDENTIFIER ON
GO
/*
AUTHOR: ANDREW BAINBRIDGE - MS SQL DBA
DATE: 13/05/2013
VERSION: 2.0
If there is a cluster failover that results in both SQL Server instances running on the same node, this
script will automatically rebalance the amount of RAM allocated to each instance. This is to prevent the
combined RAM allocated to SQL Server overwhelming the node.
If the D: drive and the H: drive are visible to the same host, this means that both instances are running
on the same node. In this event, the amount of RAM allocated to each of the SQL Serverswill be 90% of
half the amount of total RAM in the server. E.g. (384GB / 2) * 0.9
If only the D: drive or H: drive is visible, then 90% of the total amount of RAM available on the server is
allocated to the SQL Server instance.
This stored procedure will also set the max server memory of the other SQL Server instance in the
cluster. As this needs to be run across the linked server, and the sp_procoption startup procedure is
owned by SA (therefore can't use windows authentication), the stored procedure will be run on SQL
Server Agent startup, via a job.
*/
BEGIN
ELSE
BEGIN
SET @RAM = (SELECT CONVERT(INT, ((physical_memory_in_bytes / 1024 /
1024)) * 0.9) AS RAM_in_MB
FROM master.sys.dm_os_sys_info)
SET @Command = REPLACE(@Command, '$', @RAM)
SET @RAM_event = CONVERT(NVARCHAR(10), @RAM)
RAISERROR('MAX_SERVER_MEMORY set to %s', 0, 1, @RAM_event)
WITH NOWAIT, LOG
EXEC(@Command)
SET @RemSQL = 'EXEC (''' + REPLACE(@Command, '''', '''''') + ''') AT ' +
@Link
EXEC (@RemSQL)
END
END
GO
The script is executed via a SQL Server Agent job that is configured to run when the Agent service starts
up.
SQL Server: Create Random String Using CLR
Introduction
This article comes as a continuation of a similar article which shows several solutions using T-SQL
queries to create a random string. This paper presents a simple code using C# language, which allows to
obtain the same results in a much more efficient manner and fast. This is very useful for maintenance
tasks like testing (Populate large tables with random values), generate random password and so on...
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
using System.Linq;
/******************************
* Version("1.1.0.0")
* FileVersion("1.1.0.0")
* WrittenBy("Ronen Ariely")
******************************/
// AssemblyVersion attribute
using System.Reflection;
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
[assembly: AssemblyDescription("Creating Random string using CLR. Written by
Ronen Ariely")]
/// <summary>
/// How To compile:
/// 1. Open CMD SHELL
/// 2. move to the Dot.Net Folder
/// CD "C:\Windows\Microsoft.NET\Framework\v4.0.30319\"
/// 3. compile using csc.exe
/// csc.exe /target:library /out:"S:\Fn_RandomStringCLR_1.1.0.0.dll"
"S:\Fn_RandomStringCLR_1.1.0.0.cs"
///
/// * LINQ is not supported using DOT.NET 2.0 by default,
/// There for this code fit to Dot.Net 4.
///
/// </summary>
public partial class UserDefinedFunctions
{
private static readonly Random _RandomSize = new Random();
private static readonly Random _random = new Random();
private static readonly int[] _UnicodeCharactersList =
Enumerable.Range(48, 10) //
Numbers 48 - 57
.Concat(Enumerable.Range(65, 26)) // English uppercase
65 - 90
.Concat(Enumerable.Range(97, 26)) // English lowercase
97 - 122
.Concat(Enumerable.Range(1488, 27)) //
Hebrew 1488 - 1514
.ToArray();
/// <summary></summary>
/// <param name="sMaxSize"></param>
/// <param name="IsFixed"></param>
[return: SqlFacet(MaxSize = -1)]
public static SqlString Fn_RandomStringCLR(
int sMaxSize,
int IsFixed
)
{
if (IsFixed == 0){
sMaxSize = _RandomSize.Next(1, sMaxSize);
}
char ch;
for (int i = 0; i < sMaxSize; i++)
{
ch = Convert.ToChar(
_UnicodeCharactersList[_random.Next(1,
_UnicodeCharactersList.Length)]
);
builder.Append(ch);
}
return builder.ToString();
}
};
Resources
Meta-Data
How to Compare Two Tables Definition / Metadata in Different
Databases
This article provides an example on T-SQL Script to compare two tables’ definition / metadata in
different databases.
The T-SQL Script [used to compare two tables definition / metadata in different databases] in this article
can be used from SQL Server 2012 and above version because I have used the function
sys.dm_exec_describe_first_result_set that was introduced from SQL Server 2012 .
USE SQLServer2012
GO
CREATE Table Test1 (Id INT NOT NULL Primary Key,Name VARCHAR(100))
USE SQLServer2014
GO
CREATE Table Test2 (Id INT, Name VARCHAR(100), Details XML)
Below T-SQL Script can be used to compare two tables definition / metadata in different databases
USE SQLServer2012
GO
SELECT A.name DB1_ColumnName,
B.name DB2_ColumnName,
A.is_nullable DB1_is_nullable,
B.is_nullable DB2_is_nullable,
A.system_type_name DB1_Datatype,
B.system_type_name DB2_Datatype,
A.collation_name DB1_collation,
B.collation_name DB2_collation,
A.is_identity_column DB1_is_identity,
B.is_identity_column DB2_is_identity,
A.is_updateable DB1_is_updateable,
B.is_updateable DB2_is_updateable,
A.is_part_of_unique_key DB1_part_of_unique_key,
B.is_part_of_unique_key DB2_part_of_unique_key,
A.is_computed_column DB1_is_computed_column,
B.is_computed_column DB2_is_computed_column,
A.is_xml_document DB1_is_xml_document,
B.is_xml_document DB2_is_xml_document
FROM SQLServer2012.sys.dm_exec_describe_first_result_set (N'SELECT * FROM
Test1', NULL, 0) A
FULL OUTER JOIN SQLServer2014.sys.dm_exec_describe_first_result_set (N'SELECT
* FROM Test2', NULL, 0) B
ON A.name = B.name
T-SQL: Script to Find the Names of Stored Procedures that Use Dynamic
SQL
This script was developed to answer the question in this thread I need query find all the SPs that used
dynamic SQL
We can execute dynamic sql using sp_executesql or just with Exec / Execute.
To find the names of the StoredProcedure that may have used dynamic SQL, this script can be used:
SELECT Schema_name(Schema_id)+'.'+Object_Name(M.Object_id)
StoredProceduresWithDynamicSQL
FROM sys.sql_modules M
JOIN sys.objects O ON M.object_id = O.object_id
WHERE definition LIKE '%CREATE PROC%'
AND (definition LIKE '%SP_ExecuteSQL%' OR definition LIKE '%EXEC%')
But Exec / Execute can be used inside a stored procedure to call another stored procedure or to
execute a dynamic sql. So, to eliminate the Stored procedure names referencing another Stored
procedure and to find the names of the Stored Procedure, that has used Exec / Execute only to execute
dynamic SQL, the following script can be used:
SELECT Schema_name(Schema_id)+'.'+Object_Name(M.Object_id)
StoredProceduresWithDynamicSQL
FROM sys.sql_modules M
JOIN sys.objects O ON M.object_id = O.object_id
WHERE definition LIKE '%CREATE PROC%'
AND (definition LIKE '%SP_ExecuteSQL%' OR definition LIKE '%EXEC%')
EXCEPT
SELECT StoredProcedure FROM (
SELECT Schema_name(Schema_id)+'.'+Object_Name(M.Object_id) StoredProcedure
FROM sys.sql_modules M
JOIN sys.objects O ON M.object_id = O.object_id
WHERE definition LIKE '%CREATE PROC%'
AND (definition LIKE '%SP_ExecuteSQL%' OR definition LIKE '%EXEC%')) tmp
CROSS APPLY sys.dm_sql_referenced_entities (StoredProcedure, 'OBJECT');
The above script will not work under the following scenarios: if we have used Exec / Execute inside a
stored procedure for both purposes, i.e. to call another stored procedure and to execute a dynamic SQL
and the other scenario: if we have used sp_executesql or Exec / Execute and commented it inside a
Stored procedure, but still the above scripts will be useful because we don't have any other direct way
to find the names of the Stored Procedure that has used dynamic SQL.
This script also won't work for encrypted procedures.
T-SQL Script to Get Detailed Information about Index Settings
This article is about a script which I wrote to get detailed information about index settings. The script in
this article does not show any information about missing indexes or index usage details. The script will
only show the information about settings made on an index using CREATE /ALTER INDEX statements.
Example
Just for a demonstration of the script, we will make use of the table Person.Address from the
AdventureWorks database.
Using system stored procedures SP_HELP and SP_HELPINDEX , we can get only the index_name,
index_description and index_keys details.
USE AdventureWorks2012
GO
sp_help 'Person.Address'
GO
sp_helpindex 'Person.Address'
GO
Just for testing purpose I am going to create a NONCLUSTERED filtered index with included columns and
then alter the fill factor of the created index.
USE AdventureWorks2012
GO
The below code block will get us the information about settings made on an index using CREATE /ALTER
INDEX statements:
USE AdventureWorks2012
GO
SELECT
CASE WHEN I.is_unique = 1 THEN ' UNIQUE ' ELSE '' END [Is_unique],
I.type_desc+' INDEX' IndexType,
I.name IndexName,
Schema_name(T.Schema_id)+'.'+T.name ObjectName,
KeyColumns,
IncludedColumns,
I.Filter_definition,
CASE WHEN I.is_padded = 1 THEN ' ON ' ELSE ' OFF ' END [PAD_INDEX],
I.Fill_factor,
' OFF ' [SORT_IN_TEMPDB] , -- default value
CASE WHEN I.ignore_dup_key = 1 THEN ' ON ' ELSE ' OFF ' END [Ignore_dup_key],
CASE WHEN ST.no_recompute = 0 THEN ' OFF ' ELSE ' ON ' END [Stats_Recompute],
' OFF ' [DROP_EXISTING] ,-- default value
' OFF ' [ONLINE] , -- default value
CASE WHEN I.allow_row_locks = 1 THEN ' ON ' ELSE ' OFF
' END [Allow_row_locks],
CASE WHEN I.allow_page_locks = 1 THEN ' ON ' ELSE ' OFF
' END [Allow_page_locks] ,
CASE WHEN ST.auto_created = 0 THEN ' Not Automatically Created ' ELSE '
Automatically Created ' END[Statistics_Creation],
CASE WHEN I.is_primary_key = 1 THEN 'Yes' ELSE 'NO' END 'Part of PrimaryKey',
CASE WHEN I.is_unique_constraint = 1 THEN 'Yes' ELSE 'NO' END 'Part of
UniqueKey',
CASE WHEN I.is_disabled = 1 THEN 'Disabled' ELSE 'Enabled' END IndexStatus,
CASE WHEN I.Is_hypothetical = 1 THEN 'Yes' ELSE 'NO' END Is_hypothetical,
CASE WHEN I.has_filter = 1 THEN 'Yes' ELSE 'NO' END 'Filtered Index',
DS.name [FilegroupName]
FROM sys.indexes I
JOIN sys.tables T ON T.Object_id = I.Object_id
JOIN sys.sysindexes SI ON I.Object_id = SI.id AND I.index_id = SI.indid
JOIN (SELECT * FROM (
SELECT IC2.object_id , IC2.index_id ,
STUFF((SELECT ' , ' +
C.name + CASE WHEN MAX(CONVERT(INT,IC1.is_descending_key)) = 1 THEN ' DESC
'ELSE ' ASC ' END
FROM sys.index_columns IC1
JOIN Sys.columns C
ON C.object_id = IC1.object_id
AND C.column_id = IC1.column_id
AND IC1.is_included_column = 0
WHERE IC1.object_id = IC2.object_id
AND IC1.index_id = IC2.index_id
GROUP BY IC1.object_id,C.name,index_id
ORDER BY MAX(IC1.key_ordinal)
FOR XML PATH('')), 1, 2, '') KeyColumns
FROM sys.index_columns IC2
WHERE IC2.Object_id = object_id('Person.Address') --Comment for all tables
GROUP BY IC2.object_id ,IC2.index_id) tmp3 )tmp4
ON I.object_id = tmp4.object_id AND I.Index_id = tmp4.index_id
JOIN sys.stats ST ON ST.object_id = I.object_id AND ST.stats_id = I.index_id
JOIN sys.data_spaces DS ON I.data_space_id=DS.data_space_id
--JOIN sys.filegroups FG ON I.data_space_id=FG.data_space_id
LEFT JOIN (SELECT * FROM (
SELECT IC2.object_id , IC2.index_id ,
STUFF((SELECT ' , ' + C.name
FROM sys.index_columns IC1
JOIN Sys.columns C
ON C.object_id = IC1.object_id
AND C.column_id = IC1.column_id
AND IC1.is_included_column = 1
WHERE IC1.object_id = IC2.object_id
AND IC1.index_id = IC2.index_id
GROUP BY IC1.object_id,C.name,index_id
FOR XML PATH('')), 1, 2, '') IncludedColumns
FROM sys.index_columns IC2
WHERE IC2.Object_id = object_id('Person.Address') --Comment for all tables
GROUP BY IC2.object_id ,IC2.index_id) tmp1
WHERE IncludedColumns IS NOT NULL ) tmp2
ON tmp2.object_id = I.object_id AND tmp2.index_id = I.index_id
WHERE I.Object_id = object_id('Person.Address') --Comment for all tables
Related Reference links:
http://technet.microsoft.com/en-us/library/ms188783.aspx
http://technet.microsoft.com/en-us/library/ms173760.aspx
http://technet.microsoft.com/en-us/library/ms190283.aspx
http://technet.microsoft.com/en-us/library/ms175105.aspx
http://www.microsoft.com/en-in/download/details.aspx?id=722
How to Check when Index was Last Rebuilt
I have executed the script (given on below link) to rebuild all table indexes on SQL Server, but how do I
know whether it is updated or not.
Below is the query you can use to get the details on last stats updated date, this query works on
AdventureWorks database
USE AdventureWorks;
GO
GO
The need often arises to create or recreate the indexes for all tables in a database, especially in
development and testing scenarios. This article presents a script to generate Index Creation Scripts for
all tables in a database using Transact-SQL (T-SQL).
The code block below will generate Index Creation Scripts for all tables in a database:
Sometimes one needs to find out all the relationships within a database. For example, if you are a
contractor and you go to new company even for only one day, just to make some new report requested
by the boss or other similar stuff; probably you need fast code that you can keep in your personal code
folder, just for a quick copy and paste:
;with cte as (
select constraint_object_id, constraint_column_id,
c.parent_object_id as parentobjectid, parent_column_id
, referenced_object_id,
referenced_column_id, name as parentname from sys.foreign_key_columns
c inner joinsys.tables on c.parent_object_id=object_id)
,cte2 as
(select constraint_object_id, constraint_column_id, parentobjectid,
referenced_object_id, parent_column_id, parentname, referenced_column_id,
name as referencedname from cte
ct inner join sys.tables on ct.referenced_object_id=object_id)
, cte3 as
(select constraint_object_id, constraint_column_id, parentobjectid,
parent_column_id
, referenced_object_id, referenced_column_id, parentname,
referencedname, name as parentcolumname from cte2inner join sys.all_columns
cl on parentobjectid=cl.object_id
where cl.column_id=parent_column_id)
select constraint_object_id, constraint_column_id, parentobjectid,
parent_column_id
, referenced_object_id, referenced_column_id, parentname as ParentTable,
referencedname as ReferencedTable,
parentcolumname as parentsColumn, name as ReferencedColumn from cte3 inner join s
ys.all_columns cl onreferenced_object_id=cl.object_id
where cl.column_id=referenced_column_id order by ParentTable
Another purpose of this code is that (after having saved the results in a table for example) it can be
compared. That means that if you save the last result in a table that you can call LastRelantionship dated
February 2013 and you being called after months for another contract in the same company because
"maybe someone changed something and now the software doesn't work or the statistics are wrong",
you can run the same query, building a new table LastRelantionship date October 2013 and after
comparing the two tables you can quickly find out if someone touched the relationships (believe me,
this can happen pretty frequently).
So, I hope this code can help everyone to be faster in case of job contract issues.
How to Check the Syntax of Dynamic SQL Before Execution
This article is about the system function sys.dm_exec_describe_first_result_set that can be used to
check the syntax of dynamic SQL before execution.
This system function sys.dm_exec_describe_first_result_set was introduced in SQL Server 2012.
CREATE Table Test (Id INT NOT NULL Primary Key,Name VARCHAR(100))
INSERT Test SELECT 1 , 'Sathya'
GO
IF EXISTS (
SELECT 1 FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE error_message IS NOT NULL
AND error_number IS NOT NULL
AND error_severity IS NOT NULL
AND error_state IS NOT NULL
AND error_type IS NOT NULL
AND error_type_desc IS NOT NULL )
BEGIN
SELECT error_message
FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE column_ordinal = 0
END
ELSE
BEGIN
EXEC (@SQL)
END
END
GO
If you examine the dynamic SQL in the above Stored procedure, you will notice incorrect syntax in that
query with extra comma before FROM clause.
EXEC TestProc
GO
After removing comma before the FROM clause in the @SQL variable, alter the Stored procedure.
IF EXISTS(
SELECT 1 FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE error_message IS NOT NULL
AND error_number IS NOT NULL
AND error_severity IS NOT NULL
AND error_state IS NOT NULL
AND error_type IS NOT NULL
AND error_type_desc IS NOT NULL )
BEGIN
SELECT error_message
FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE column_ordinal = 0
END
ELSE
BEGIN
EXEC (@SQL)
END
END
EXEC TestProc
GO
CHAPTER 13:
Bulk-Data
Using Bulk Insert to Import Inconsistent Data Format (Using Pure T-SQL)
Introduction
Some third-party applications produce reports as CSV files where each line is in a different format. These
applications use an export format that parses each record on-the-fly, and each value in the record
separately. The format for each value is set by the value itself.
For example, if there is data in the field, then it will export the data inside a quotation mark, but if there
is no data then the data will be blank and without a quotation mark. Moreover some applications do not
use 'data type' when generating a report. If there is data and the data is numeric, then some
applications might not use any quotation marks, and in the same field on a different record data that is
not numeric will be exported with the quotation marks. We can imagine a single CSV file with a specific
column exported in 6 different formats.
In order to use bulk insert directly we have to make sure that all the data is consistent with one format.
The problem
We need to use bulk insert to import data from a CSV file into the SQL server database using pure T-SQL.
A Bulk insert operation can use only one format file and our metadata must remain consistent with it.
The first step using bulk insert is to find a set of "bulk insert" rules (like: End Of Line, End Of Column,
Collation…) that fit all of the data. This is not the case sometimes as mentioned above.
If you got the answer in the forum that this can't be done using pure T-SQL then remember that I
always say "never say never".
* In this article we are going to talk only about a pure T-SQL solution, as there are several solutions (such
as using SSIS, CLR or a third party app) that can do it in different ways; sometimes in a better way.
Our application exports the data as a CSV Comma Delimited file without a consistent format. In this
example we will deal with a very common situation that fits these rules:
1. Our application use column type (just to make it easier for this article we will focus on a string
column). So a numeric column will never use quotation marks and a string column will use quotation
marks on and off by these rules.
2. If there is data in the field then it will export the data inside quotation marks (no matter if the data is
numeric or not, as the column is string type)
3. If there is no data then the data will be blank and without quotation marks.
The original sample data that we use looks like this:
According to the application export rules above, our CSV file looks like this:
1,9999999,"ronen","ariely"
2,8888888,"xxx1,xxx2",yyy
2,8888888,"xxx1,xxx2",
3,,,"yyy"
4,7777777,,
,2222222,zzz,kkk
5,1111111,"5000.5","5"
* we can see in the last line that our application uses column type, so even when our value is numeric it
will be inside quotation marks. But we have to remember that there are some more complex situations,
like applications that do not use column type. Then the last line can look like: [5,5000.5,5]. And it can be
more complex if the culture formats numbers similar to 5,000.5. Then our CSV line might look like this
[5,5,000.5,5]
The solution:
* Remember that this is only a workaround for our specific case. For each set of data a slightly different
solution might fit. The idea of how to get to the solution is what is important here.
In this step we will run several test with different format files. Our aim is to identify any potential
problems and to find the best format file which will fit as many columns as we can from the start.
First of all you have to find a record format that fits most of the data, as well as the columns that might
have be in-consistent with this format. In order to do that we are going to run several tests and then we
will implement the conclusion at the next step. We will start with a simple bulk insert and continue with
some more complex formats. Using the ERROR messages and the results, we will identify the potential
problems.
Let's try this in practice
Open Notepad and copy our CSV data into the file.
1,9999999,"ronen","ariely"
2,8888888,"xxx1,xxx2",yyy
2,8888888,"xxx1,xxx2",
3,,,"yyy"
4,7777777,,
,2222222,zzz,kkk
5,1111111,"5000.5","5"
* make sure that you use ANSI format when you save the file (you can use a different format like
UNICODE but for this example we shall use ANSI).
Open Notepad and copy our XML format data into the file.
* Using a file format can help for more complex formats. I highly recommended always to use a file
format.
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t "
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance ">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR="," />
<FIELD ID="2" xsi:type="CharTerm" MAX_LENGTH="7"
TERMINATOR="," />
<FIELD ID="3" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR="," />
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR="\r\n" />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="FirstName" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="LastName" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Try to use this simple bulk insert query to import our data:
No error… have you got a good feeling? Let's check our data
Results:
In our case we can see that the first and second columns have no problem, but the problems start on
the third column and continue to fourth column. First of all we have some quotation marks in the
results. Moreover the third column was split in several records and part of the data moved into the
fourth column. Actually, as our format file says that the third column ends on the comma, then every
time we have a comma as part of the string data it will be split. That make sense.
When we have string data we surround the content in quotes. If our data had "a consistent format" then
all string data would be enclosed in quotes, even empty data.
1,9999999,"ronen","ariely"
2,8888888,"xxx1,xxx2","yyy"
2,8888888,"xxx1,xxx2",""
3,,"","yyy"
4,7777777,"",""
,2222222,"zzz","kkk"
5,1111111,"5000.5","5"
In that case the solution was very simple. We could use this format file:
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t "
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance >
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR=',' />
<FIELD ID="2" xsi:type="CharTerm" MAX_LENGTH="7"
TERMINATOR=',\"' />
<FIELD ID="3" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="1
5" TERMINATOR='\",\"' />
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="1
5" TERMINATOR='\"\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT" />
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT" />
<COLUMN SOURCE="3" NAME="FirstName" xsi:type="SQLNVARCHAR" />
<COLUMN SOURCE="4" NAME="LastName" xsi:type="SQLNVARCHAR" />
</ROW>
</BCPFORMAT>
Now execute the bulk insert and the data should be placed in the table correctly. If our data was
formatted in this way (with consistent format) then we would not need this article :-)
In some cases we might build a format file which bring us error messages. We already know that the
data will not fit all records. This test will give us more info using the error message. Try to use this
format file (C:\ArielyBulkInsertTesting\Test03.xml):
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t "
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance ">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR="," />
<FIELD ID="2" xsi:type="CharTerm" MAX_LENGTH="7"
TERMINATOR=',\"' />
<FIELD ID="3" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR=',' />
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR='\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="FirstName" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="LastName" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Msg 4864, Level 16, State 1, Line 1 Bulk load data conversion error (type mismatch or invalid character
for the specified codepage) for row 4, column 2 (PhoneNumber).
Moreover we can see that the rest of the records in our data were inserted (using SQL 2012). In some
cases using data with a small amount of inconsistent records this can be the best way, as most of the
data is inserted. Now we can just check which records do not exist in the table and fix it. The error
message includes the number of the problematic first row.
In conclusion the best format file we found succeeds in inserting the first and second columns without
any problem. We recognized that the problems start on the third column.
This is the main step, as now we can use bulk insert to import the data into SQL Server. Since we found
that our data does not have a consistent format, we are going to use a temporary table to import the
data.
* We don’t have to use a temporary table, as we can just use OPENROWSET to get the data and do the
parsing on-the-fly. I will show this in step 3.
The basic idea is to bring all the data before the problematic point (in our case the first and second
columns) into separate columns, as they should appear in the final table. Then the rest of the data from
the problematic point to the end of the problematic point (or to the end of line if there is no other way)
into one column. So in our case the third and fourth columns will be imported as one column.
Let's do it. We will use this format file (save as C:\ArielyBulkInsertTesting\Test04.xml), which is similar to
"Test01.xml" file, without the third column:
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t "
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance ">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR="," />
<FIELD ID="2" xsi:type="CharTerm" MAX_LENGTH="7"
TERMINATOR=',' />
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="30
" TERMINATOR='\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="FirstName_LastName" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
And execute this query (drop old table, create new table with 3 columns, bulk insert data, select and
show the data):
ID PhoneNumber FirstName_LastName
----------- ----------- ------------------------------
1 9999999 "ronen","ariely"
2 8888888 "xxx1,xxx2","yyy"
2 8888888 "xxx1,xxx2",
3 NULL ,"yyy"
4 7777777 ,
NULL 2222222 "zzz","kkk"
5 1111111 "5000.5","5"
The goal of this article is to give an optimal use of pure T-SQL in order to import data which is not well
formatted into the database, in an appropriate and effective structure for parsing(step 3). I will not
elaborate on step 3 of parsing the data. There can be hundreds of different ways to do this for each
case.
I'll show some sample solutions in step 3. Those solutions are not necessarily optimal parsing solutions,
but represent solutions in which I use different string functions for parsing our data. Usually when
parsing data in SQL Server it is best to use CLR functions.
Step 3: Parsing the data into the final table
Now that we imported the data, all that we need to do is parse the last column. These queries can do
the job in our case:
select
ID, PhoneNumber , FirstName_LastName
, FN = case
when CHARINDEX('",', FirstName_LastName, 1) > 0
then LEFT (
RIGHT(FirstName_LastName, LEN(FirstName_LastName) - 1)
, CHARINDEX('",', FirstName_LastName, 1) - 2
)
else ''
END
, LN = case
when CHARINDEX(',"', FirstName_LastName, 1) > 0
then SUBSTRING(
FirstName_LastName
, CHARINDEX(',"', FirstName_LastName, 1) + 2
, LEN(FirstName_LastName) - CHARINDEX(',"', FirstName_LastName,
1) - 2 )
else ''
END
from #test
go
-- I use @ char but you should use any combination of chars that cannot be in
the data value!
-- I can clean in one time all " char as I know it is not part of my data
select ID, PhoneNumber , FirstName_LastName
, SUBSTRING(Temp, 0, charindex('@',Temp) ) FN
, SUBSTRING(Temp, charindex('@',Temp) + 1, LEN(Temp) -
charindex('@',Temp)) LN
from (
select
ID, PhoneNumber , FirstName_LastName
, Temp = REPLACE(REPLACE(REPLACE(REPLACE (FirstName_LastName, '","', '@
'),'",','@'),',"','@'),'"','')
from #test
) T
go
After we found a way to parse the data, we can use a simple SELECT INTO query to move the data from
the temporary table to the final table.
Usually if this is not a onetime operation then I prefer to use one query do it all without declaring a
temporary table. I do need these steps to find my Bulk Insert query & format (step 1+2) and to find the
parsing function (step 3). Next I convert my queries into an OPENROWSET import query like this (in our
case study)
--FINAL TABLE
insert #FINAL
select
ID, PhoneNumber --, FirstName_LastName
, FN = case
when CHARINDEX('",', FirstName_LastName, 1) > 0
then LEFT (
RIGHT(FirstName_LastName, LEN(FirstName_LastName) - 1)
, CHARINDEX('",', FirstName_LastName, 1) - 2
)
else ''
END
, LN = case
when CHARINDEX(',"', FirstName_LastName, 1) > 0
then SUBSTRING(
FirstName_LastName
, CHARINDEX(',"', FirstName_LastName, 1) + 2
, LEN(FirstName_LastName) - CHARINDEX(',"', FirstName_LastName,
1) - 2 )
else ''
END
FROM OPENROWSET(
BULK N'C:\ArielyBulkInsertTesting\Test01.csv'
, FORMATFILE = 'C:\ArielyBulkInsertTesting\Test04.xml'
) a
GO
Summary
The basic idea is to bring all the data in the problematic columns (or until the end of line if there is no
other way) into one column. We can use a temporary table to store the data. Then we can parse the
temporary column using any way that suits us. We can use T-SQL functions or CLR functions like SPLIT.
We can clean some characters using replace. We can find characters using CHARINDEX, and so on. This is
all depends on your specific data. It has nothing to do with bulk insert anymore :-)
We must separate the operation into two parts:
1. Insert the data using bulk insert into the data base (temporary table or using OPENROWSET) in such
way that we will be able to use it for step two
2. Parsing and splitting the text on the last column into the final columns
Comments
* A more complex case study in which I used this logic can be seen in the MSDN forum in this link:
http://social.msdn.microsoft.com/Forums/en-US/5aab602e-1c6b-4316-9b7e-1b89d6c3aebf/bulk-insert-
help-needed
* Usually it is much better to do the parsing using CLR functions. If you are not convinced by my
recommendation then you can check this link: http://www.sqlperformance.com/2012/07/t-sql-
queries/split-strings
* If you can export the file in a consistent format fit with bulk insert than you should do it! This is only a
workaround solution.
* If you can build a well formatted import file in advance, from the original import file, using a small
application which will format a new file, then do it! This is a much better solution as most languages do
a better job of parsing text than SQL Server (T-SQL).
* If you can manage the order of the columns during the exporting, then try to make sure that you move
all the problematic columns to the end. This will help us to use the bulk insert in a more optimal way as
we will need to parse fewer columns in step 3
* Why not import all the data into one column in a temp table instead of STEP 1 & STEP 2?
This is always an option but probably not a good one. In our case study we use a very simple table
structure with 4 columns and only 7 records, but in real life we might get a table with 20 columns or
more and several million records. If we have 2 columns (out of 20 columns) with potential problems and
we can order the columns so those columns come last, than we can import the most of the data (18
columns) into the final data structure, and we will need to import only the last two columns into one
column for parsing. It is much better to separate the data into as many columns as we can and minimize
the use of parsing. Parsing is a CPU intensive operation. Parsing the data after importing will probably
take longer.
When you have to use complex parsing it is much better to use CLR solutions. As I mentioned in the start
this is a pure T-SQL solution.
Resources
* This article is based on several forum questions (more than 15 which I found using google, and I
checked only the first several pages of search results) that remained unanswered for too long. I did not
find any solutions or answers except my own based on this logic. This is a very easy solution but we have
to think outside the box to get it :-)
There are no other references for this solution that I know of and most forum questions that I found
were closed, or by sending the questioner to a different solution like using SSIS, or a third party
application, or by saying that it cannot be done using bulk insert and pure T-SQL.