Escolar Documentos
Profissional Documentos
Cultura Documentos
Database Workshop
An ongoing manual to the ins and outs of T-SQL in
regards to the CHSI database structure.
CHSI SQL
Table of Contents
Introduction..................................................................................................................................... 3
Some Terms.................................................................................................................................... 3
Tool Setup....................................................................................................................................... 4
Queries............................................................................................................................................ 5
SELECT......................................................................................................................................... 5
TOP........................................................................................................................................... 6
Constant Columns..................................................................................................................... 6
FILTERING (the WHERE clause).................................................................................................... 6
Other Comparison Operators.................................................................................................... 6
Combining comparisons........................................................................................................... 7
Performance............................................................................................................................. 7
LIKE.............................................................................................................................................. 7
Wildcards.................................................................................................................................. 8
Calculated Fields.......................................................................................................................... 8
Concatenated Field................................................................................................................... 8
CONCAT....................................................................................................................................... 9
Math......................................................................................................................................... 9
Aggregate Functions (Counting, summarizing, etc).....................................................................9
Counting...................................................................................................................................... 9
Summing...................................................................................................................................... 9
Max, Min, AVG............................................................................................................................ 10
Grouping.................................................................................................................................... 10
CASE............................................................................................................................................. 11
Combining Tables.......................................................................................................................... 12
JOINS.......................................................................................................................................... 12
Sub Queries............................................................................................................................... 13
DISTINCT....................................................................................................................................... 14
UNION........................................................................................................................................... 15
Converting Data Types.................................................................................................................. 16
Common DataTypes (used at CHSI)........................................................................................... 17
CONVERT................................................................................................................................... 17
Date Formatting......................................................................................................................... 17
Modifying Data.............................................................................................................................. 18
INSERT....................................................................................................................................... 18
UPDATE...................................................................................................................................... 18
MERGE....................................................................................................................................... 19
Inserting Cascading Records...................................................................................................... 19
Introduction
This guide is a general guide to using T-SQL, emphasizing areas that are applicable with working
against CHSI SQL Servers.
Some Terms
These terms are presented in order of required understanding.
Database
The combination of tables, views, and other database objects that comprise a give installation of
a customers data.
TABLE
The storage container for raw data. Tables have two portions to their name, in the format of
Schema.TableName. If you do not specify a schema, the sql server will assume you mean dbo as
a default schema. For example, SELECT * FROM clients is interpreted as SELECT * FROM
dbo.clients, whereas SELECT * FROM UW.ScheduleItem is not translated, because the UW schema
Figure 1 - Sample Table
is specified.
ResultSet
A resultset can be imagined as a piece of paper with columns and rows, like a spreadsheet.
Resultsets are created via the execution of a query.
Query
A combination of commands that together produce one or more resultsets. More columns
requested increases the time for a query, as does a lack of filters or complex filters (generating
more rows).
Subquery
A query can be used as a replacement in any query a replacement for a table or column in a
query. In this way you can cascade queries for performance or complexity reasons.
VIEW
Instead of querying a subquery, a query can be converted into a view. A view can be referenced
as if it were a table, but has the performance of the original query. Views can be simpler to work
with, but often perform less well than querying the table(s) directly. Like tables, views have two
parts to their name, SCHEMA.ViewName. If you do not specify the schema, SQL will assume you
are referring to DBO.ViewName.
FROM
In a query, the FROM command designates where the columns of data should be pulled from,
and is normally either a table, a view or a subquery.
ALIAS
An alias renames an object in a query, either to make it easier to refer to something somewhere
else, to control output, or to allow for invalid characters like spaces. Youll see us use them like
SELECT PolicyPeriod [Policy Period]
JOIN
A join command adds an additional table to the left or right of the previous table, effectively
adding more possible columns to a resultset.
UNION
A union command adds an additional query to the bottom of a resultset, effectively adding
more possible rows to a resultset.
Index
Indexes acts like a secret table that duplicates certain columns in a table. Because this secret
table is smaller (has less columns) than the full table, it performs faster than the normal table.
When filtering data, using columns that are indexed can drastically improve the performance of
your queries. If a filter is slow, asking for an index to be added can enable that drastic
performance improvement.
Tool Setup
To maximize your experience, you should be using the latest supported version of SQL Server
Management Studio (referred to as SSMS). As of this writing, we use SSMS 2012.
Line Numbers
To make it easier to ask for help and to receive help, turning on line numbers can greatly help.
Tools->Options
Text Editor, expand (if not already)
All Languages,
Check Word Wrap and Line Numbers
Click OK
Queries
SELECT
The foundation of most users interaction with SQL. SELECT allows you to specify the columns
you want to form the basis of a resultset.
At its simplest, you simply specify SELECT Column1, Column2, FROM Table1, something like
This query will retrieve the clientid, clienttype, and clientname for every client in the database.
Performance Warning
Every additional column, makes the query operate just a little bit (maybe even a lot) slower. Use
the least number of columns possible, use the wildcard (*) seldom if at all, to maintain maximum
performance.
TOP
One of the most commonly used additions to SELECT, is TOP. You can restrict how many rows
you return by changing your SELECT to something like SELECT TOP 15, as seen in the query
below.
Constant Columns
You can specify a string or other hardcoded value as a column. This can be useful for helping
define value in a report. They run instantly as a general rule.
This query is much more useful, creating a resultset containing all clientids, clienttypes, and
clientnames for all non-deleted clients.
Notice
The Clients is missing its schema name, so SQL server infers that you would like to
reference dbo.Clients.
<> means not equal, so in the previous query, ClientType <> 'deleted' just translated to
ClientType NOT equal to deleted.
=
Equals
> or >=
Greater Than or Greater Than or Equals
< or <=
Less Than or Less Than or Equals
<> or !=
Not Equals
LIKE
Allows for wildcard comparisons
In (Value1, Value2, )
Allows a list of values
BETWEEN Value1 and Value2
Same as saying >= Value1 and <= Value2
Combining comparisons
You can combine multiple comparisons using either AND or OR.
AND is simple, you simply create two filters and combine them with AND. Like so
If you use an OR, you should control the order of comparisons by placing parenthesis around
blocks of comparisons in the groups you want the results filtering. Otherwise, you may
accidentally allow more rows than you want.
Performance
LIKE
You can use the LIKE in a WHERE to do partial matching. When using a LIKE, % equals anything. For example,
to find all clients whose clientname starts with m, youd do something like:
Wildcards
%
_
[]
[^]
Calculated Fields
In addition to selecting fields, you can create calculated column outputs.
Concatenated Field
CONCAT
Math
COUNT(*)
returns a count of all rows from the resultset.
COUNT (ColumnName)
returns a count of all non-null values.
COUNT(DISTINCT ColumnName) return a count of distinct values.
Summing
Very simple, simply use SUM() around the column you want to sum. The function accepts distinct
to only sum distinct values as well, though the use case for this is rare.
Grouping
Usually you wont want to do a simple aggregate function, youll want to get results for each
parent record or by something arbitrary like month. Simply add GROUP BY for every column that
you dont include in an aggregate function.
You can be very clever in grouping to generate some useful reports. Just be careful, dont group
by month if you really intend month of year for example.
CASE
Many times, you will want to select different values depending on a different set of values.
Anything you can use in a WHERE clause, you should be able to use in a CASE statement. For
that matter, most functionality in any clause, works in every other clause. For example
Combining Tables
A properly built database will store data in multiple tables, either for performance or for the
purpose of not duplicating data. You will almost always want to combine these multiple tables
together to generate a complete resultset.
You have a few different options for combining tables.
JOINS
Joins are the primary way to combine multiple tables. With a join, you select one or more
columns in each table and say to match on both sides.
INNER Only include rows where the value exists on both the left and right table.
LEFT Include every row from the left table, and every row from the right table where the
values match.
RIGHT Include every row from the right table, and every row from the left table where
the values match.
FULL OUTER Include every row from the right and left, whether there is a value on the
left or right.
CROSS No value match here, repeat every row on the right for every row on the left.
An example of an inner join, matching every contact on every row from the clientcontacts table
on the contactid column, then filtering to only clientid 10003.
This example uses CROSS JOIN to get a resultset containing every group combined with every
possible groupspec.
Sub Queries
You can nest queries into columns or in replacement of tables. For performance reasons, you
should avoid nesting queries in columns, relying instead on well-crafted subqueries in the FROM.
A good use case for joining on a subquery, is to avoid doing multiple subquery columns without
having to group the parent table(s).
Here we generate a sum of all credits and debits, in some cases, the performance can
outperform grouping the parent query.
DISTINCT
Often, when joining, youll end up with duplicate rows. In those cases, you can use SELECT
DISTINCT to remove all duplicates. In many cases, this means bad data, or an incomplete join.
ALWAYS verify your joins before attempting DISTINCT. For example
Above, we simply wanted all of the fields and values for ScheduleItemID 1. Here is the fixed
query.
In this query, we get a list of all policyperiods for each client from the AcctsRecvLog table.
UNION
While JOINS merge tables horizontally, sometimes we want to merge the resultsets vertically.
This is what a UNION does, combining two resultsets with the only caveat being that they must
have the same columns with the same type of data in them. When doing a default UNION, SQL
will automatically apply a DISTINCT to the resultset.
Here we make sure we get all clients with all of their policyperiods from both the AcctsRecvLog
table and the Invoice table.
There is a performance hit to doing a UNION (or a DISTINCT for that matter), and sometimes you
dont care about duplicates. If thats the case, you can use UNION ALL for a faster resultset,
though be prepared for the duplicates.
VARCHAR(n | max) A common string datatype, holding various forms of text. n must be
a number OR can be the word max to just tell SQL to store as much text as it can.
VARBINARY(n | max) Stores binary files like images or PDFs. N must be a number OR can
be the word max to just tell SQL to store as large of files as it can.
Bit A 1 or 0, very small, very fast. Used for true/false or yes/no values.
Int Any whole (non-decimal) number from -2^31 to 2^31 in size.
Decimal (p,s) A precise number including decimal. The p represents the total number of
digits stored, while the s stores the number of digits allowed to the right of the decimal
point. Decimal(9,2) represents a number like 1234567.89
Date- Stores a date only, no time component.
DateTime Stores both a date and its time component. The time stores out to milliseconds
in precision.
SmallDateTime Stores both date and time, but only from the year 1900 to 2079, the time
only stores out to seconds. It functions faster than DateTime.
CONVERT
Sometimes, SQL will automatically convert for you. You can use this chart to know when
this will happen for you.
Most often, youll want to convert the datatypes explicitely. For extensive documentation,
go here. As a quick overview, to use CONVERT, You call it by passing the new datatype and the
Date Formatting
One of the most useful features in CONVERT, is the ability to supply simple date formats when
converting dates to text.
Modifying Data
INSERT
Adding simple data is simple, define your columns and the values you want to insert. You
must provide at minimum, all required columns for a table, any values you dont provide will be
given a default value (usually NULL).
In addition to static values however, you can also insert a SELECT statement.
Here we insert a contact, presumably from some batch process, but we insert the current
time/date using the getdate() function, then we select the inserted contact to see the new id and
timestamp using the scope_identity() function.
UPDATE
When you need to change a value in a table, you can use the UPDATE statement.
WARNING: Always use the WHERE clause to avoid accidentally setting your entire table to a
single set of values. Some people go so far as to add a non-valid clause while writing a query to
prevent accidental updates, something like WHERE 1=2 is enough to prevent any accidental
updates. Then, prior to executing a reviewed update, you simply delete the 1=2 part.
MERGE
In many cases, youll want to update a value if its already there, or insert if its not. The MERGE
statement can handle this scenario and more.
Here we make sure the groupspec is inserted if it does not exist, or update it if it does.
Here we pretend we have some values we might have imported from a spreadsheet. You can use
this method for testing further queries prior to actually importing data.
MERGE output
With the knowledge of MERGE and table variables, we can now do a cascading insert, grabbing
new IDs from the MERGE, and inserting those values into a related table. We are going to use
the @ImportSpreadSheet from the prior table, along with another variable to hold the generated
contactids.
This is a fairly complicated query. We insert any contacts that dont already exist (with an
assumption that emails should be unique) into the contacts table, grabbing their new contactIDs
and inserting them into a temp table. We then merge the temp table on the original table, and
insert that into the clientcontacts table.
You could do the same thing for deletes, deleting all of the child records for a given set of
records.
DELETE
Warning: We generally recommend not deleting data, instead setting the relevant status
column to inactive or deleted as relevant. In some cases, you will want to delete a row, if you
know this is the case, proceed.
Similar to the UPDATE statement, the DELETE statement requires extreme caution, you may want
to add a WHERE 1=2 or something similar until youre 100% sure with your query.
Almost identical to the UPDATE statement, just indicate the table you want to delete from, and
the parameters you want to match on.
Here we delete the test contact we inserted in the prior example. Careful though, I left stranded
clientcontacts!
Above is a cleaner alternative, weve deleted our clientcontacts records. This is a bit contrived,
another option would have been to delete the clientcontacts first.
ORDER BY
If you do not tell the database how to order the resultset, it will decide for you how the resultset
should be ordered. If the data is for end user consumption, youll usually want to order the data
intentionally. To do this, you simply end your query with ORDER BY, then a list of columns you
want the resultset ordered by. By default, the data will be sorted ascending, but you can
explicitly define the direction of the sort by appending ASC or DESC to the column. For example,
ORDER BY Column1 ASC, Column2 DESC. You can refer to columns that exist in the FROM, or by
the aliased columns referenced in the SELECT.
In the example below, we can see that when selecting all records, the itemDates are in
seemingly random order.
Here, we take the same data, but sort it by ClientID, then by ItemDate.
UNION Ordering
When you ORDER a UNION, ORDER BY works a bit differently. You can only ORDER using the
columns in the resultset. The example below shows ordering by a calculated column
(PolicyPeriod), also note that the second resultset does not specify the column name for what is
rendered as the PolicyPeriod in the first resultset.
Using Functions
Sorting is generally incredibly simple, but just as you can use functions everywhere else, you
can, and should, use functions to control how the resultset is sorted in useful ways.
Above, we decide (for whatever reason) that we want the Value FieldName to be first, so we use
a CASE statement to indicate that Value = 0, and anything else equals 1. SQL handles the rest,
ordering the 0s then the 1s.
And here we fix it. We use a CASE statement to make sure the column is numeric, then if so, we
convert it to decimal(18,4).
Here, if we pass a string without specifying the parameter name, SQL figures out that we must
want @CoverageLine to be the string we passed.
In the above query, we retrieve the results, store it in a temporary table, then join the clients
table to add the clientnumber.
examples. The other reason you dont usually want a subquery in the SELECT, is that you can
only retrieve a single value out of each subquery.
Apply
You may have noticed that we cant correlate subqueries in the FROM. Thats what Apply
is for, allowing you to pass values from the other source, into a subquery or function.
CROSS APPLY
Cross Apply functions similar to INNER JOINs, only returning values from the left that has results
on the right. You specify CROSS APPLY a subquery and alias it.
OUTER APPLY
Outer Apply is an outer join, showing you rows where they exist or not. Here is the same query,
but as an OUTER APPLY, we see there is a lot more data.
Summarizing Data
Generating some numbers for a report is easy with Excel or Crystal. But sometimes you want to
do it just via a raw SQL query. This can be especially useful for powering the scheduled batch file
functions in Connections.\
WITH CUBE
When you CUBE the data, you ask SQL to give you some summaries of each set of data. In the
below query, we can see that by adding WITH CUBE to the query, we see how many schedule
items at each client and location, how many schedule items for each client, and how many
schedule items for each location, as well as how many schedule items total.
WITH ROLLUP
If you care more about the grand totals, you can use WITH ROLLUP. Below, we can see how
many schedules exist at each client, at each location, and how many total.
This is all well and good, but we can do better, making it a bit easier to read the data by using
GROUPING. GROUPING tells you what time of row a given column is. Type 1 means its a
summary row, which means you can use a CASE statement to figure out better labels.
The important bits here, is that you need to check both columns to see whether its a Grand
Total, or whether its just a subtotal line.
Numbering Rows
ROW_NUMBER
Sometimes, you want to generate a new number for each row based on certain criteria.
ROW_NUMBER() will generate a new number starting over however you specify. In the below
query, I tell it to generate a new number starting over for each GroupID, and to order by the sum
of total paid for that clients claims (Ascending).
RANK()
In the previous query, we can see that for the PBSIG, there are a lot of ties, yet the
PaymentOrder column continues to count up. If we were trying to reward clients based on
their totalpaid, it wouldnt be very fair for all of the other clients who paid 0, to give them a
higher rank. This is what RANK() is for.
Above, we can see that all of the clients who paid 0, all have a payment rank of 1.
WARNING: If we scroll down a bit, we can see that when there finally is an amount, the client
is given a payment rank of 354. This is because there is 353 RANK 1s, and SQL says that there
can be multiple ties, but they still take up ranks.
You may want to know who the #2 rank is, #3, etc, regardless of how many clients might tie.
DENSE_RANK() performs this function.
CTE
At its most basic, a CTE (Common Table Expression) is simply a resultset created by a previous
query, referenced in an immediate followed query. This can improve performance, by creating a
very specific small resultset that you then join onto a more involved table, or allow you to join
onto a resultset more than once without repeating the query.
Recursive/hierarchal CTEs
The true power of CTEs come when you make a hierarchal CTE. You can UNION a CTE against
itself, then query the results of that join. In the below query, we get all clients, then we UNION
the resultset back against the CTE, and it continues to UNION until it runs out of new rows to add.
WARNING: If it UNIONS more than 50 times (by default, the cte will fail. For more info, read
about MAXRECURSION
For that, a CTE can be a simple solution to the problem. When you interact with a CTE, it actually
affects the base tables. We combine the CTE with the ROW_NUMBER function to generate
unique numbers per row, then we delete everything except the first.
Each of the duplicates, is now different, with the RN. We test first, then do the delete.
In the above query, we can see that 2 of the 3 duplicates were deleted.
Dynamic SQL
There are some things you cannot do via parameters, or at least not easily/cleanly, for those
situations, you may want to have a query that creates another query and executes it.
Dynamic SQL should be constructed in a few steps. You declare an NVARCHAR variable to store
the query text, another to store the list of parameters, then you call the sp_executeSQL function
with your query, your parameters, and then however many parameters you wanted to pass.
Above, the PTSIG text is passed in as the @xGroupID parameter because it is the first
parameter after the parameterlist. Its important to remember that whether you pass in a literal
string (like PTSIG) or a parameter, it is the order of the parameters that matters, not the name.
That isnt very useful, you could have written the query directly. Where the power comes in, is
doing things like a dynamic SELECT TOP. You may want a report user to be able to control how
many records are returned, but you cant pass in a parameter to a query as the TOP number of
rows. Below, we construct the query dynamically.
In the above query, you may wonder why not just declaring the @NumberofRecords parameter as a
VARCHAR(3) to begin with, it would mean you dont need to CONVERT. The primary reason is to
prevent people from abusing or hijacking your query. If you accept a string into a query you
know should be a number, only for the sake of a few less characters, then you let the user pass
in all kinds of wrong text, like perhaps '100 *,' which would convert your list of clientids, into a
dump of all of the rows of the table.
We create a parameter called @DynamicQuery, just like before, but instead of writing a complete
query, we write only part of the query, stopping where we want the dynamic columns to be
generated.
Then we SELECT the dynamic portion. A SELECT statement is rendered row by row, not all at
once, so we can take advantage of that iterative process by appending the parameter to itself.
So we say we want the value of the SELECT, to be the parameter plus a new field. Because
that field might have invalid SQL characters like spaces, we wrap the column name in brackets.
We can imagine how the SQL looks if we write it down
Iteration
0
1
2
3
SQL
SELECT ScheduleItemID
SELECT ScheduleItemID, [Make]
SELECT ScheduleItemID, [Make], [Model]
SELECT ScheduleItemID, [Make], [Model],
[Year]
By SELECTing your query, you can diagnose what went wrong, or help visualize step by step
what is actually happening. This is an important tool to master, if you decide to delve into
dynamic SQL.
PIVOT
Dynamic PIVOT Columns
PERFORMANCE
Using Set Statistics IO On
Resources
Some references used to create the above examples
http://msdn.microsoft.com/en-us/library/bb545450.aspx
http://shannonlowder.com/
http://blog.sqlauthority.com/