Você está na página 1de 7

Why cant T-SQL do that?

Using SQL CLR with C# to transpose data Why cant T-SQL do that? Its a common complaint I hear from my development team, the bane of my job. As a member of a small development shop for sizable a healthcare company, the line is often blurred between my official role as DBA and my ad-hoc role as database developer. I bridge the gap between our web/report developers skill set and the latest offerings from Microsoft SQL Servers ever increasing tool set, demonstrating how a new feature can be used to approach a problem differently, more efficiently. One such SQL 2005 feature of particular interest is the SQL CLR (Common Language Runtime) where SQL objects can be developed in the .NET framework utilizing robust languages such as C# or VB.NET; I have gravitated towards C# where for me, the language is more intuitive. The C# language augments T-SQLs set-based strengths by providing procedural efficiencies and expanded data structures such as multidimensional arrays. How many times have you found yourself creating temp tables with cursors in T-SQL to mimic the utility of an array? This article demonstrates how the array in C# can be used to easily provide a solution to a common need: the transpose of a querys result set. The need to transpose data The company I work for provides healthcare to our elderly population; the financial/clinical data and subsequent enterprise reports are not only critical to the health of the company, but are necessary to ensure the high quality care our patients deserve. Often, this data is date driven where the query displays the dates in a column; however, the report layout requires the dates to be displayed in the first row as the column header.
Report query results Report layout requirement

It is quickly apparent that the report just needs the data from the source query to be flipped, where the rows become the columns and the column, rows. In other words, we need to transpose the data. In Excel, this process is simple: either use the transpose function or pastetranspose the data. However in T-SQL, there is not an easy method using builtin objects to accomplish a transpose. Like many database shops, we eagerly tried out the new SQL 2005 PIVOT/UNPIVOT features and although we found great utility with their use, the PIVOT/UNPIVOT process does not solve the transpose question; we are not pivoting data into aggregation buckets. So here in lies our developers complaint, Why cant we simply transpose data in T-SQL, just switching rows and columns, like we can in Excel? The process

seems simple enough, but I can understand why Microsoft does not have a transpose function in T-SQL; since row data will mostly not be of the same datatype, the transposed column datatype cannot be determine without some assumptions of the output. So rather than defending Microsoft by explaining the dilemma of the transpose to our developers, why not create our own transpose solution? If the data could be stored in an array where manipulation is granular, the transposition is straight forward. The CLR and C# offers this flexibility not found in T-SQL. Using the SQL-CLR and C# to transpose data Our goal is this: create a general solution which takes any query the Caller has permission-context to execute and transpose the result set. Of the CLR SQL objects available, the table-valued function ostensibly appears to be appropriate. However, tabled-valued functions require the output table schema to be defined at create time. Since our solution should be generalized for any query, we will use a CLR stored procedure. The overall strategy to this problem is one that can be used as a template to attack general CLR data solutions: Obtain the dataset within the CLR environment Manipulate your data Generate a result set, scalar value, and/or report messages back to the Caller Handle errors

Once you have mastered this basic strategy with the CLR, you have the basis and subsequent confidence to tackle a whole new world of complex data questions. When faced with a problem, your solution environment is now longer confined to T-SQL, but is expanded to the CLR languages toolsets. In developing the transpose solution, we first examine a simple example to deconstruct the transpose process. Once the transpose process is understood, assumptions are established for the solution and the sections of the C# solution are examined. Examining a simple transpose Lets consider a simple query and the transpose of its result set:
select * from Sample

EXEC Transpose 'select * from Sample' 3 rows X 4 columns 4 rows X 3 columns

When considering the result set of a query, we typically do not consider the column header as a row. However, from our example above, we observe that the column names are also involved in the transpose and therefore are considered a row for our purposes. As a consequence, our 3 by 4 result set is transposed to a 4 by 3 result set.

The data from the query is stored in the two-dimensional array, queryData[i, j] where the i index references the row location and the j index references the column location. It is important note that array-elements are zero-based referenced: the first position is referenced by 0, the second position is by 1, the third by 2, and so on. Therefore, the expression queryData[1,5] refers to the data value located at the intersection of the second row (1) and the sixth column (5). With the data stored in the queryData array, we can now follow the transpose of a single element using the array notation.

Tranpose Expressed as an array

Tranpose

The data element 10.78 is contained in dataArray[1,2]. When the element is transposed, it is moved to the zero-based location of [2,1]. If the transposed data is stored in a new array transposedData[i, j], we have the following relationship.

Moving data to new array

tranposedData[2, 1] = queryData[1, 2] = 10.78. Notice that the indices are simply switched. To generalize this, tranposedData[i, j] = queryData[j, i], which is the basis of our solution. Assumptions made for the solution Often assumptions are made in the problem to narrow the scope of a solution. In this case, two assumptions are made: 1. Limit number of queried rows to 2048. When an array is constructed, the size has to be declared; the size of the queryData[i, j] array is the number rows and columns of our result set. We do know the number of columns of the result set, but not the number of rows at the time the array is constructed. Hence this value is arbitrary and configurable in the variable declaration section. In practice, I never had a business need to transpose more that 180rows; in test, I have transposed 5000 rows by 784 columns of clinical data which is over 3.9 million data elements. 2. The transposed columns datatype will be VARCHAR(100). The 100 character limit is configurable in the declaration section and can be as large as 8000; this limit has met our reporting requirements. But this limitation begs the question, is this assumption reasonable? Consider this transpose:

Row to columns

Is the datatype of the transposed column DATETIME, VARCHAR, NUMERIC or INT? Because our datatypes are typically mixed in the row and we want our transpose solution to be general, we are forced to the lowest common denominator of VARCHAR. Moreover, the motivation of the transpose functionality is to fulfill a data reporting need where the presentation is text-based; any required aggregation can be performed in the input query itself. The CLR -C# Transpose stored procedure

Using the general strategy described above, this diagram outlines the program workflow.

Execute Callers query and store data

Transpose dataset

Pipe the transposed data back to the Caller

Handle errors

CLR C# environment Transpose stored procedure CLR C# environment

The code is divided in to 5 sections: 1. 2. 3. 4. 5. Variable Declarations Execute Caller's query and store data Transpose the data Output the data back to Caller Handle errors

Listing 1 provides the C# code in its entirety. We bypass the variable declarations and focus on the remaining sections. Section 2: Execute Caller's query and storing the data In this section, the Callers query is executed and the result set is stored in an array. The first order of business is to open a connection to the SQL service: first create a SqlConnection object and then create a SqlCommand object using the Callers query and the connection object. The connection is then opened:
conn = new SqlConnection("context connection=true;"); comm = new SqlCommand(callersQuery, conn); conn.Open();

Take note of the connection string context connection=true. With this connection, the query is executed in the same context of the Caller, both in location and security.

Next, the query is executed and is assigned to a SqlDataReader object. When the ExecuteReader method is called, the query execution begins, but data is not read. Before we begin reading the data rows, column metadata such as count, names and datatype can be obtained; the FieldCount method is used to determine the column count and the GetName method is used for the column names.
dataReader = comm.ExecuteReader(); columnCount = dataReader.FieldCount; queryData = new string[maxNumberofRows, columnCount]; for (int j = 0; j < columnCount; j++) { queryData[0,j] = dataReader.GetName(j); }

Using the Read method of the reader object, the queryData array is filled with the result set. The rowCount variable increments each time a row is read to maintain the total number of rows.
while (dataReader.Read()) { for (int j= 0;j < columnCount; j++) { queryData[rowCount,j] = dataReader[j].ToString(); } rowCount++; } dataReader.Close(); conn.Close();

Once the outer loop is finished reading the rows, the queryData array is now populated with the querys result set. To recap what is known, the first row of the queryData array contains the column names, the remaining array rows contain the queried data, rowCount holds row count (plus 1 to account for the column-header row), and columnCount holds the colunn count of the result set. Section 3: Transpose the data Next order of business is to transpose the data into the new array transposedData. To help with readability, new row and column count variables are created and used to size the new transposedData array. The new row count is the old column count and the new column count is the old row count.
transposedRowCount = columnCount; transposedColumnCount = rowCount; transposedData = new string[transposedRowCount, transposedColumnCount];

Loop through the queryData array transposing the data into tranposedData. The outer loop transverses the rows and the inner the columns.
for (int i = 0; i < transposedRowCount; i++) { for (int j = 0; j < transposedColumnCount; j++) { transposedData[i, j] = queryData[j,i]; } }

The transpose is achieved with the tranposedQueryData[i, j] = queryData[j, i] statement which was glean from the Examining a simple transpose section. The transposedData array now contains the transposed data.

Section 4: Output the transposed result set back to Caller We are now ready to output, or pipe the result set back to the Caller. The basic strategy for constructing a result set back to the Caller is as follows: 1. Construct a SqlMetaData array filling each column element with a column name, datatype, and data size. 2. Construct a SqlDataRecord object using the SqlMetaData array. 3. Begin the result set output by calling the SqlContext.Pipe.SendResultsStart method passing in the SqlDataRecord. 4. Output the rows of the result set, one row at a time. Fill each element of the SqlDataRecord one column at a time and pipe the record to the Caller with the SqlContext.Pipe.SendResultsRow method. 5. Finish the result set calling the SqlContext.Pipe.SendResultsEnd method. Utilizing this strategy, construct the SqlMetaData array and define each column looping through the column names; this is where the VARCHAR(100) assumption is used in the SqlDbType.VarChar, maxDataSize reference . Remember, the first row of the transposedData[0,j] array contains the transposed column names.
transposedColumns = new SqlMetaData[transposedColumnCount]; for (int j = 0; j < transposedColumnCount; j++) { transposedColumns[j] = new SqlMetaData(transposedData[0, j], SqlDbType.VarChar, maxDataSize); }

Now construct the SqlDataRecord and start the output.


rowRecord = new SqlDataRecord(transposedColumns); SqlContext.Pipe.SendResultsStart(rowRecord);

Loop through the row and columns of the transposedData array and send the row to the Caller.
for (int i = 1; i < transposedRowCount; i++) { for (int j = 0; j < transposedColumnCount; j++) { rowRecord.SetSqlString(j, transposedData[i, j]); } SqlContext.Pipe.SendResultsRow(rowRecord); }

Finally end the result set and send the Caller the message Transpose complete. Messages are sent to the Caller using the SqlContext.Pipe.Send method.
SqlContext.Pipe.SendResultsEnd(); SqlContext.Pipe.Send("Transpose complete.");

Section 5: Handle errors With CLR programming, it is important to handle errors gracefully and meaningfully. Errors from this procedure typically stem from a malformed

input query. In the example below, the s is missing from the sys.objects table reference.
EXEC Transpose 'Select * from sys .object ' Messages : There was a problem . Exception Report : Invalid object name 'sys.object '.

As with SQL 2005 error handling, the Try/Catch construction is used to handle any errors which may occur during the entire transpose process.
try{ transpose process} catch (Exception e) { SqlContext.Pipe.Send("There was a problem. \n\nException Report: "); SqlContext.Pipe.Send(e.Message.ToString()); }

The CLR Tranpose procedure is now complete and control is returned to the Caller.
return;

Compile and Deploy the Transpose CLR stored procedure It is not necessary use Visual Studio 2005/2008 professional to build this project. The .NET 2.0 Framework (installed with your SQL 2005 instance) for the C# compiler and a text editor such as notepad are all that is required. In the download files, instructions are supplied to compile and deploy the Tranpose procedure. Using the Transpose stored procedure The syntax for the Transpose stored procedure is
EXEC Transpose @query = 'Query to be transformed'

If the query contains single-quotes to denote strings, then substitute the single quotes with two single-quotes as depicted below.
SELECT * FROM sys.objects WHERE [type] = 'S' @query = 'SELECT * FROM sys.objects WHERE [type] = ''S'' '

Experiment with the various types of queries: simple, relational, and distributed. Also, intentionally create mistakes in the query syntax to view how the TRY/CATCH handles the error. Extend your T-SQL environment with the CLR This Transpose procedure solution provides a simple exercise in using the SQL CLR to extend your T-SQL environment to solve a common problem which is otherwise cumbersome to solve in traditional T-SQL. Once you have mastered executing queries, processing data and generating result sets within the CLR C# environment, you will gain the confidence to tackle more complex problems.

Você também pode gostar