Você está na página 1de 8

PROC TRANSPOSE FOR MULTIPLE VALUES

KANHAIYA GUPTA, TCS, FOND DU LAC, WISCONSIN

ABSTRACT

PROC TRANSPOSE can be used to rotate (transpose) SAS data sets. This procedure
transforms the data from rows to columns or from columns to rows. But PROC
TRANSPOSE has some limitations. It doesn't works as required for multiple values of
VAR parameter FOR ID/BY parameter. This paper demonstrates how Transpose can be
done when VAR parameter has multiple values without losing any record in output
dataset.

PROC TRANSPOSE
PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET>
<NAME=name> <OUT=output-data-set> <PREFIX=prefix>;

BY <DESCENDING> variable-1 <…<DESCENDING> variable-n> <NOTSORTED>;

COPY variable(s);

ID variable;

IDLABEL variable;

VAR variable(s);

Options
DATA= input-data-set names the SAS data set to transpose.
Default: most recently created SAS data set

LABEL= label specifies a name for the variable in the output data set that contains the
label of the variable that is being transposed to create the current observation.
Default: _LABEL_

LET allows duplicate values of an ID variable. PROC TRANSPOSE transposes the


observation containing the last occurrence of a particular ID value within the data
set or BY group.

NAME= name specifies the name for the variable in the output data set that contains the
name of the variable being transposed to create the current observation.
Default: _NAME_
PROBLEM
Suppose we have a SAS dataset that looks like below where a UPC can have multiple
type and a type can have multiple value.

UPC TYPE VALUE


1 A 1
1 B 2
1 B 3
2 B 4
2 C 5
2 C 6
3 A 8
3 A 9
3 B 1
3 B 4
3 B 8
3 C 9
3 C 2
3 D 1

Now we are required to manipulate the data by UPC so that output has one column for
each TYPE, and the rows for each UPC represent ALL possible combinations of the
values for that UPC (also called a Cartesian expansion).

The output dataset should look like

UPC A B C D
1 1 2
1 1 3
2 4 5
2 4 6
3 8 1 9 1
3 8 1 2 1
3 8 4 9 1
3 8 4 2 1
3 8 8 9 1
3 8 8 2 1
3 9 1 9 1
3 9 1 2 1
3 9 4 9 1
3 9 4 2 1
3 9 8 9 1
3 9 8 2 1
If we do simple PROC TRANSPOSE then that will not help.

PROC TRANSPOSE DATA = AA OUT = BB;


BY UPC;
ID type;
VAR value;
RUN;

Error will be thrown like below.

ERROR: The ID value "B" occurs twice in the same BY group.


NOTE: The above message was for the following by-group:
UPC=1
ERROR: The ID value "C" occurs twice in the same BY group.
NOTE: The above message was for the following by-group:
UPC=2
ERROR: The ID value "A" occurs twice in the same BY group.
ERROR: The ID value "B" occurs twice in the same BY group.
ERROR: The ID value "B" occurs twice in the same BY group.
ERROR: The ID value "C" occurs twice in the same BY group.
NOTE: The above message was for the following by-group:
UPC=3
ERROR: All BY groups were bad.

We can use LET option. LET allows duplicate values for an ID variable.

PROC TRANSPOSE DATA = AA OUT = BB LET;


BY UPC;
ID type;
VAR value;
RUN;

The output will look like

UPC _NAME_ A B C D
1 value 1 3 . .
2 value . 4 6 .
3 value 9 8 2 1

But this is not what we are looking for. LET option will pick up only last occurrence of a
particular ID value within the data set or BY group.
SOLUTION
A Solution of this problem can be to first separate single and multiple value records.
Arrange dataset AA in order of UPC and TYPE and make two datasets one (SV) having
only one value for UPC and TYPE combination and another (MV) datasets containing
multiple occurrences for a UPC and TYPE combination. For MV dataset, generate all the
right combinations and assign proper index to each of the combination. Then combine SV
and MV datasets. Now we have an index assigned to each unique combination of UPC,
TYPE and VALUE. Now apply Transpose on UPC and INDEX.

The code to generate the desired reporting dataset looks like:

%let MAXFIELDS = 100 ;


%let MAXVALUES = 25 ;
%let MAXFVMATRIX = %eval(&MAXFIELDS. * &MAXVALUES.) ;

/* A Macro to sort a Dataset */

%macro sort(ds,by) ;

proc sort data=&ds ;


by &by ;
run;

%mend ;

/* A macro to find out the number of records in a Dataset */

%macro nobs(ds, mvar) ;

%global &mvar ;
%let &mvar = 0 ;

data _null_ ;
set &ds nobs=nobs ;
call symput("&mvar", nobs) ;
stop ;
run ;

%mend nobs ;

/* A macro to do Transpose for multiple values */

%macro xtrans(in,out) ;
%sort(&in, upc chrtyp) ;
/* split by single and multiple values */
data sv mv ;
set &in ;
by upc chrtyp ;
if first.chrtyp and last.chrtyp
then output sv ;
else output mv ;
run ;

/* if there are multiple values expand, transpose, merge with the single values */
%nobs(mv, nobs) ;
%if &nobs %then %do ;

data
tmp_xtrans (keep=upc pi_index output_type output_value sortedby=upc pi_index )
sv_xid1(keep=upc pi_index sortedby=upc pi_index)
;

set mv ;
by upc chrtyp ;

retain num_fields 0 ;

array buf_field_names(&MAXFIELDS.) $50 _temporary_ ;


array buf_reps(&MAXFIELDS.) _temporary_ ;
array buf_field_values(&MAXFIELDS., &MAXVALUES.) $100 _temporary_ ;
array buf_field_counts(&MAXFIELDS.) _temporary_ ;

/* initialize upc info */


if first.upc then do ;
num_fields = 0 ;
end ;
/* first.upc */

/* create list of field names and */


if first.chrtyp then do ;
num_fields = num_fields + 1 ;
buf_field_names(num_fields) = chrtyp ;
buf_field_counts(num_fields) = 0 ;
end ;

/* add value to list of values for field */


if buf_field_counts(num_fields) < &MAXVALUES. then do ;
buf_field_counts(num_fields) = buf_field_counts(num_fields) + 1 ;
buf_field_values(num_fields, buf_field_counts(num_fields)) = chrvl ;
end ;
/* done with this UPC, output rows */
if last.upc then do ;

/* create combination index. compute the number of rows to output */


buf_reps(num_fields) = 1 ;
pi_count = buf_field_counts(1) ;
do i = num_fields to 2 by -1 ;
buf_reps(i-1) = buf_reps(i) * buf_field_counts(i) ;
pi_count = pi_count * buf_field_counts(i) ;
end ;

/* output a row for each expanded type and value upc,pi_id,type,value */


/* generate the right combinations for this type with proper index */
do j = 1 to pi_count ;
pi_index = j ;
output sv_xid1 ;
do i = 1 to num_fields ;
output_type = buf_field_names(i) ;
xind = mod(ceil(j/buf_reps(i)), buf_field_counts(i)) ;
if xind = 0 then xind = buf_field_counts(i) ;
output_value = buf_field_values(i,xind) ;
output tmp_xtrans ;
end ;
end ;

end ;
/* done with this UPC */
run ;

proc sql ;
create table sv2 as
select *
from sv a left join sv_xid1 b
on a.upc = b.upc ;
run ;

data mv_t ;
set sv2 tmp_xtrans(rename=(output_type=chrtyp output_value=chrvl)) ;
by upc pi_index ;
run ;

proc transpose data=mv_t out=&out. (drop=_: pi_index) ;


by upc pi_index ;
id chrtyp ;
var chrvl ;
run ;
%end ;

%else %do ;
/* upc with single values */
proc transpose data=sv out=&out.(bufno=4 drop=_: ) ;
by upc ;
id chrtyp ;
var chrvl ;
run ;
%end ;

%mend ;

/* create input Dataset */

data AA;
infile datalines dlm = ',';
input upc chrtyp $ chrvl $;
datalines;
001, A, 1
001, B, 2
001, B, 3
002, B, 4
002, C, 5
002, C, 6
003, A, 8
003, A, 9
003, B, 1
003, B, 4
003, B, 8
003, C, 9
003, C, 2
003, D, 1
;
run;

/* Call Macro */

%xtrans(AA,BB);

Num_field contains the number of unique TYPE for a UPC. buf_field_counts array
contains the number of occurrences of a TYPE for a UPC. buf_field_names array contains
name of all unique TYPE for a UPC. buf_field_values array contains all values of a TYPE
for a UPC. In the above example it is assumed that maximum number of TYPE possible is
25.
CONCLUSION
This is an extremely powerful programming technique, which can be used to generate
TRANSPOSE of a datasets where VAR has multiple occurances for ID/BY parameters.
This code provides the basic programming structure for transposing datasets which with
little bit modification can be used to get desired output.

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:

Kanhaiya Lal Gupta


Tata Consultancy Services Ltd
Fond Du Lac, WI 54935
Work Phone: (920) 929-7870
E- mail: kanhaiya.gupta@tcs.com

Você também pode gostar