Escolar Documentos
Profissional Documentos
Cultura Documentos
SAS Training
Quality improvement
Applications development
The core of the SAS system is base SAS software, which consists of:
SAS Language
SAS Procedures
SAS Macros
ODS
Windowing Environment
SAS Files
Data Step
Procedure Step
SAS Informats
SAS Formats
Variables
Functions
Statements
SAS GUI
1.
Project Designer:
2.
Project Explorer:
3.
Code Editor:
4.
Server List:
5.
Log Window:
6.
Output Window:
SAS Programs
SAS programs can be used to access, manage, analyze, or present your data
SAS Libraries
Temporary library
Permanent library
Depending on the library name that is used when create a file, we can store SAS files
temporarily or permanently
Temporary Library:
Permanent Library:
Local
SASuser
SAShelp
The LIBNAME statement is global, which means that the librefs remain in effect until modify
them , cancel them, or end your SAS session
The LIBNAME statement assigns the libref for the current SAS session only
Assign a libref to each permanent SAS data library each time a SAS session starts
SAS no longer has access to the files in the library, once the libref is deleted or SAS session is
ended.
Syntax :
Example:
Here,
Taxes
libname -
Data lib1.emp;
Length name$ 12;
Input id name$ doj sal;
Informat doj mmddyy8. sal dollar7.;
Format doj date9. sal dollar7.;
Label id = Employee Id name = Employee Name doj = Date of Joining
Sal = Salary;
Cards;
1076 abcasdayut 12/23/05 $10,000
1983 aaaertgr 07/12/98 $40,000
1723 xyzasdsf 04/15/98 $25,000
;
Run;
Many of the data processing tasks access data in the form of a SAS data set and analyze,
manage, or present the data
A SAS data set also points to one or more indexes, which enable SAS to locate records in the
data set more efficiently
Payroll
LABDATA1995_1997
_EstimatedTaxPayments3
Descriptor portion
Data portion
Descriptor Portion:
The descriptor portion of a SAS data set contains information about the data set, including:
Data Portion:
Example:
Here,
Observations:
It is a Collections of data values that usually relate to a single object in SAS Data Sets
The values Jones, M, 48, and 128.6 constitute a single observation in the data set shown
below
Variables:
The values Jones, Laverne, Jaffe and Wilson contribute the variable Name in the data set
shown below
Missing Values:
Variable Attributes:
In addition to general information about the data set, the descriptor portion contains information
about the attributes of each variable in the data set
Example: Listing of the attribute information in the descriptor portion of the SAS data set
Clinic.Insure
Variable Type Length Format
Informat
Label
Policy
Num
Policy
Number
Total
Num
20
Patient Name
Name
Char
Name:
Variable names follow exactly the same rules as SAS data set names
Type:
Character variables, such as Name (shown below), can contain any values
Numeric variables, such as Policy and Total (shown below), can contain only numeric values
(the digits 0 through 9, +, -, ., and E for scientific notation)
Length:
A variable's length (the number of bytes used to store it) is related to its type
In the example below, Name has a length of 20 characters and uses 20 bytes of storage.
Numeric values (no matter how many digits they contain) are stored as floating-point numbers in
8 bytes of storage, unless specify a different length.
Format:
Informat:
Label:
A variable can have a label consisting of descriptive text up to 256 characters long
To display more descriptive information about the variable assign a label to that variable
Example:
Label Policy as Policy Number, Total as Total Balance, and Name as Patient Name to
display these labels in reports
Two-Level Names:
Two-level name are used to reference a permanent SAS file in SAS programs
Libref name
Filename
Libref Is the name of the SAS data library that contains the file
Filename Is the name of the file itself
Example:
Clinic.Admit is the two-level name for the SAS data set Admit
To reference temporary SAS files specify the default libref Work, a period, and the filename
Example:
Here,
The two-level name Work.Test references the SAS data set named Test that is stored in the
temporary SAS library Work
One-Level name
One-level name (the filename only) can be used to reference a file in a temporary SAS library
Example:
Here,
The one-level name Test also references the SAS data set named Test that is stored in the
temporary SAS library Work.
Data Step
Proc Step
A DATA step
A PROC step
Combination of DATA and PROC step
Data Step:
Typically create or modify SAS data sets and they can also be used to produce custom-designed
reports
Compute values
Produce new SAS data sets by subsetting, merging, and updating existing data sets
Proc Step:
They pre-written routines that enable us to analyze and process the data in a SAS data set and to
present the data in the form of a report
PROC steps sometimes create new SAS data sets that contain the results of the procedure
compute values
produce new SAS data sets by subsetting, merging, and updating existing data sets.
The fig below shows how to design and write a DATA step program to create a SAS data set
from raw data that is stored in an external file
Data step:
Data & Set Statements:
Syntax:
DATA <dataset1> ;
SET <dataset2> ;
Where,
Data can be entered into SAS data set directly through SAS program
Reading instream data is useful when to create data and test programming statements on a few
observations
DATALINES statement as the last statement in the DATA step (except for the RUN
statement) and immediately preceding the data lines
a null statement (a single semicolon) to indicate the end of the input data
If the data contains semicolons, use the DATALINES4 statement plus a null statement that
consists of four semicolons (;;;;) to indicate the end of the input data
Syntax:
DATA <datasetname>;
INPUT <variablename1>[$] <variablename2>[$] ;
DATALINES;
.
.
data lines go here
.
.
;
run ;
After typing in the values give a semicolon to indicate the end of the data values.
Example:
Data emp_details ;
Input id name$ age ;
Datalines ;
2458 Murray, W
2462 Almers, C
2501 Bonaventure, T
2523 Johnson, R
2539 LaMance, K
2544 Jones, M
42
38
48
39
45
49
;
run ;
Here,
A dataset called emp_details is created with variables id, name & age, and having 6
observations
SAS GUI can be used to import different file types data such as:
Excel File
Proc import procedure step can be used to import an external file of different file types
Syntax:
proc import datafile = External file path out= <dataset name> dbms= <file type> replace;
delimiter= special character ; getnames= <yes/no> ; datarow= n ;
Where,
External file path is the path of the external file to import
Out= specifies the dataset to be created using the imported file
dbms specifies the file type to be imported or dlm if delimited files are imported
replace replaces already existing files
getnames=yes tells SAS to read the variable names from the first line of the data file
delimiter= specifies the delimiter in the external file. It is specified only when the dbms=
dlm is specified
datarow =n specifies the row from which the data has to read from the external file.
Where, n is a number
Comma separated file is a special external file with file extension .csv (comma separated
variables)
proc import datafile="comma.csv" out= mydata dbms=csv replace;
getnames=no;
run;
Here,
A comma separated file called comma.csv is imported
A new dataset called mydata is created
getnames=no indicates that the first row in the file is not variable names
replace indicated SAS to replace the existing file mydata
Example 2:
Another way of reading a comma delimited file is to consider a comma as an ordinary delimiter
Here is a program that shows how to use the dbms=dlm and delimiter=","
proc import datafile="comma1.txt" out=mydata dbms=dlm replace;
Delimiter =", ;
Getnames =yes ;
Datarow =5 ;
Run ;
Here,
comma1.txt is a comma separated text file whose variable values are separated by
commas
Here,
tab.txt is a tab separated text file
dbms=tab indicates tab.txt as tab separated file
Data Understanding
Proc Contents Step:
The CONTENTS procedure is used to create SAS output that describes either of the following:
Describes the structure of the data set rather than the data values
Displays valuable information at the...
Name
Engine
Creation date
Number of observations
Number of variables
File size (bytes)
Variable level
Name
Type
Length
Formats
Position
Label
Syntax:
Where,
libref is the libref that has been assigned to the SAS library.
NODETAILS (NODS) suppresses the printing of detailed information about each file
when _ALL_ is specified.
Example:
To view the contents of the Mylib library, submit the following PROC CONTENTS step:
proc contents data = mylib ._all_ nods ;
run ;
The output from this step lists only the names, types, sizes, and modification dates for the SAS
files in the Mylib library
To view the descriptor information for the Mylib.Admit data set, submit the following PROC
CONTENTS step:
proc contents data = mylib .admit ;
run ;
The output from this step lists information for Mylib.Admit data set, including an alphabetic list of
the variables in the data set
Proc Print:
Prints a listing of the values of some or all of the variables in a SAS data set
Syntax:
proc print data = libref .Datasetname [ (firstobs = n obs = n) split = Special Character
double label n noobs ] ;
[
Id Variable list ;
Var Variable list ;
By Variable list ;
Sum Varibale list
]
Run ;
Where,
[ ] are optionals
Libref is the library in which Datasetname is the dataset whose values are to be
printed
Split ='split character' - splits labels as column headings across multiple lines where split
character appears
Label - uses variable labels as column headings (variable name is default heading)
Id -Identify observations by the formatted values of the variables which can be listed instead of
observation numbers
Var -Select variables that appear in the report and determine their order
Example:
We can create a new data set from an existing SAS data set
To create the new data set, read a data set using the DATA step and use the programming
features of the DATA step to manipulate data
Store the manipulated data to new data set or the same which will overwrite the existing data
Syntax 1:
Data SAS-data-set;
Set SAS-data-set;
Run;
where ,
SAS-data-set in the DATA statement is the name (libref.filename) of the SAS data set to
be created (Destination Data Set)
SAS-data-set in the SET statement is the name (libref.filename) of the SAS data set to
be read (Source Data Set)
Example:
Where
Lab23 and research are two libraries which are created in two different locations
The DATA statement creates the permanent SAS data set Drug1H
Drug1H will be stored in a SAS data library to which the libref Lab23 has been
assigned
The SET statement below reads the permanent SAS data set Research.CLTrials.
Syntax 2: Data Transfer from one library to another using Proc Copy
Libref1 is the library from which the data sets are to copied
Select is an option which selects the data sets Ds1, Ds2, etc form libref1 to libref2
If Select is not used, all the data sets from libref1 is copied to libref2
Example:
Here,
Data Set admit is copied from clinic libref to temporary library work
Firstobs
Obs
Label
Rename
Delete
Drop
Keep
by group
point= option
Output
END= option
Firstobs and Obs options are used to select a range of observations from a data set
When used in proc print step the output displays the selected observations
If only Firstobs is specified, observations from that position to the end of file are selected
If only Obs is specified, observations from first to the specified no: are selected
Syntax:
data SAS-Data-Set;
Set SAS-Data-Set (firstobs = n obs = n);
run;
or
data SAS-Data-Set (firstobs = n obs = n);
Set SAS-Data-Set;
run;
Where,
Firstobs and Obs options can be used both in Data Step or Set Step
Example:
data candy_products;
set local.candy_products (firstobs=10 obs=100);
run;
Here,
Label can be assigned temporarily in proc step or permanently using data step
Label assigned in data step remains in memory and will be shown when the data set is printed
using proc print step
Rename statement in data step will permanently rename the variable in the data set
Syntax:
Example:
Data demo.class;
Set demo.class ;
Label sizehh = Size of household;
Rename sizehh = sizehouse;
Run;
proc print data = demo1.class1 Label;
label sizehh = Size of Household;
run;
Here,
Size of household is assigned as label for the variable Sizehh in Data step
Size of household label is assigned for the variable Sizehh temporarily using proc step
which is effective only when that block of code is executed
Drop= and Keep= options in data step can be used to drop and keep variables in that data set
Use the KEEP= option instead of the DROP= option if more variables are dropped than kept
Specify drop and keep options in parentheses after a SAS data set name
Syntax:
(DROP = variable(s))
(KEEP = variable(s))
where ,
the DROP= or KEEP= option, in parentheses, follows the name of the data set that
contains the variables to be dropped or kept
Example:
1.
Timemin and Timesec are dropped from the data set clinic.stress
data clinic.stress (drop= timemin timesec);
Set clinic.stress;
Run;
2.
Another way to exclude variables from data set is to use the DROP statement or the KEEP
statement
Like the DROP= and KEEP= data set options, these statements drop or keep variables
The DROP statement differs from the DROP= data set option in the following ways:
The KEEP statement is similar to the DROP statement, except that the KEEP statement specifies
a list of variables to write to output data sets
Use the KEEP statement instead of the DROP statement if the number of variables to keep is
significantly smaller than the number to drop
Syntax:
DROP variable(s);
KEEP variable(s);
Where,
Example:
data clinic.stress;
Set clinic.stress;
drop timemin timesec;
Run;
Here,
Where statement can be used to select observations during proc step and data step
Syntax:
Where where-expression;
Where,
The WHERE statement works for both character and numeric variables
Following comparison operators can be used to express a condition in the WHERE statement:
Symbol
Meaning
Example
= or eq
equal to
^= or ne
not equal to
> or gt
greater than
where income>20000;
< or lt
less than
>= or ge
where id>='1543';
<= or le
The CONTAINS operator selects observations that include the specified substring.
Example:
WHERE statements can be used to select observations that meet multiple conditions
To link a sequence of expressions into compound expressions, use logical operators, including
the following:
Operator
Meaning
AND or &
OR or |
Example:
1.
3.
The IF-THEN statement executes a SAS statement when the condition in the IF clause is true
comparison and Logical operators can be used in IF conditional expression
Any numeric value other than 0 or missing is true, and a value of 0 or missing is false
Syntax:
Example:
Data clinic.stress;
Set clinic.stress;
if totaltime > 800 then TestLength = 'Long';
else if 750 <= totaltime <= 800 then TestLength ='Normal';
else if totaltime < 750 then TestLength = 'Short';
Run;
Here,
If first IF expression is not true, the control will check the next expression. If true it will
assign and quit the execution
If first and second IF statements are not true, the control will come to third expression
and assign Short to Testlenght
If Then statement along with Delete option can be used to select observations in a data set and
delete
Syntax:
IF expression THEN DELETE;
true, the DELETE statement executes, and control returns to the top of the DATA step
(the observation is deleted).
false, the DELETE statement does not execute, and processing continues with the next
statement in the DATA step
Example:
Data clinic.stress;
Set clinic.stress;
if resthr < 70 then delete;
Run;
Here,
The IF-THEN and DELETE statements below omit any observations whose values for
RestHR are lower than 70
When a long series of mutually exclusive conditions and the comparison is numeric,
using a SELECT group is more efficient than using a series of IF-THEN or IFTHEN/ELSE statements because CPU time is reduced
SELECT groups also make the program easier to read and debug.
Syntax:
SELECT <(select-expression)>;
WHEN-1 (when-expression-1 <..., when-expression-n>) statement;
WHEN-n (when-expression-1 <..., when-expression-n>) statement;
<OTHERWISE statement;>
END;
Where,
The optional select-expression specifies any SAS expression that evaluates to a single value.
WHEN identifies SAS statements that are executed when a particular condition is true.
Example:
The SELECT group assigns values to the variable Group based on values of the variable
JobCode
The observation in each data set will stack together according to the order specified to form new
data set
Appends the observations from one data set to another data set
Syntax:
DATA output-SAS-data-set;
SET SAS-data-set-1 SAS-data-set-2;
RUN;
Where,
Example:
Data combined;
Set A C;
Run;
The base file gets appended with observations from data file.
Works only if the base file is having all the variables in the data file, otherwise use force option
Syntax:
Where,
Force is an optional keyword, used when base file is having some variables missing
compared to data file, to force SAS to append
Example:
Merging
A merge combines observations from two or more SAS data sets based on the values of
specified common variables (one or more)
Syntax:
DATA output-SAS-data-set;
MERGE SAS-data-set-1 SAS-data-set-2;
BY <DESCENDING> variable(s);
RUN;
Where,
variable(s) in the BY statement specifies one or more variables whose values are used
to match observations
DESCENDING indicates that the input data sets are sorted in descending order by the
variable that is specified
If there are more than one variable in the BY statement, DESCENDING applies only to
the variable that immediately follows it
Each input data set in the MERGE statement must be sorted in order of the values of
the BY variable(s)
Each BY variable must have the same type in all data sets to be merged
Procedure sort can be used to sort the data sets either ascending or descending
Syntax:
Proc Sort Data = Data-Set-1 [out = Data-Set-2];
By [Descending] Variabel1 [Variable2 ];
Run;
Here,
If OUT= option is specified then a Data-Set-1 will be copied to Data-Set-2 and will get
sorted there but the original data set (Data-Set-1) remains un sorted.
By statement will sort the data set according to the variables specified
Descending option will sort the data set in descending order by the variable just
proceeding that.
Example:
During match-merging SAS sequentially checks each observation of each data set to see whether
the BY values match, then writes the combined observation to the new data set
data merged;
merge a b;
by num;
run;
1. Clinic.Demog
proc sort data=clinic.demog;
by id;
run;
proc print data=clinic.demog;
Obs
ID
Age
Sex
Date
A001
21
05/22/75
A002
32
06/15/63
A003
24
08/17/72
A004
A005
44
02/24/52
A007
39
11/11/57
03/27/69
2. Clinic.Visit
proc sort data=clinic.visit;
by id;
run;
proc print data=clinic.visit;
run;
Obs
ID
Visit
SysBP
DiasBP
Weight
Date
A001
140
85
195
11/05/98
A001
138
90
198
10/13/98
A001
145
95
200
07/04/98
A002
121
75
168
04/14/98
A003
118
68
125
08/12/98
A003
112
65
123
08/21/98
A004
143
86
204
03/30/98
A005
132
76
174
02/27/98
A005
132
78
175
07/11/98
10
A005
134
78
176
04/16/98
11
A008
126
80
182
05/22/98
Example: Merging
data clinic.merged;
merge clinic.demog clinic.visit;
by id;
run;
Obs
ID
Age
Sex
Date Visit
SysBP
DiasBP
Weight
A001
21
11/05/98
140
85
195
A001
21
10/13/98
138
90
198
A001
21
07/04/98
145
95
200
A002
32
04/14/98
121
75
168
A003
24
08/12/98
118
68
125
A003
24
08/21/98
112
65
123
A004
03/30/98
143
86
204
A005
44
02/27/98
132
76
174
A005
44
07/11/98
132
78
175
10
A005
44
04/16/98
134
78
176
11
A007
39
11/11/57
12
A008
126
80
182
05/22/98
By default, DATA step match-merging combines all observations in all input data sets
To exclude unmatched observations from output data set, use the IN= data set option and the
subsetting IF statement in DATA step.
the IN= data set option to create and name a variable that indicates whether the data set
contributed data to the current observation
the subsetting IF statement to check the IN= values and to write to the merged data set only those
observations that appear in the data sets for which IN= is specified
Syntax:
(IN=variable)
Where,
Within the DATA step, the value of the variable is 1 if the data set contributed data to
the current observation. Otherwise, its value is 0.
Example:
To Match-merge the data sets Clinic.Demog and Clinic.Visit and select only observations that
appear in both data sets :
The first IN= creates the temporary variable indemog, which is set to 1 when an
observation from Clinic.Demog contributes to the current observation; otherwise, it is
set to 0
IF statement is used to select only observations that appear in both Clinic.Demog and
Clinic.Visit
If the condition is met, the new observation is written to Clinic.Merged. Otherwise, the
observation is deleted
data clinic.merged;
merge clinic.demog (in= indemog) clinic.visit (in=invisit);
by id;
if indemog=1 and invisit=1;
run;
proc print data=clinic.merged;
run;
Output:
Obs
ID
Age
Sex
BirthDate
Visit
SysBP
DiasBP
Weigh
t
VisitDate
A001
21
05/22/75
140
85
195
11/05/98
A001
21
05/22/75
138
90
198
10/13/98
A001
21
05/22/75
145
95
200
07/04/98
A002
32
06/15/63
121
75
168
04/14/98
A003
24
08/17/72
118
68
125
08/12/98
A003
24
08/17/72
112
65
123
08/21/98
A004
03/27/69
143
86
204
03/30/98
A005
44
02/24/52
132
76
174
02/27/98
A005
44
02/24/52
132
78
175
07/11/98
10
A005
44
02/24/52
134
78
176
04/16/98
Condition
No condition
Description
Includes all the observations from both the dataset
If Y = 1
If X = 1
Exact Join
If X = 1 and Y = 1
Outer Join
If X = 0 or Y = 0
If X = 0 and Y = 1
If X = 1 and Y = 0