Escolar Documentos
Profissional Documentos
Cultura Documentos
Lesson 1:
http://www.youtube.com/watch?v=dwKfZq9lPFM
SAS Interface:
1 box divided into 2, 1st part has 2 tabs like EXPLORER & RESULTS. 2nd part
has 3 tabs: OUTPUT, LOG & EDITOR.
Explorer: has files and libraries etc: every assignment in library is assigned
a default name as WORK and it displays results same as in excel sheet.
Results: it shows the subsidiary results file. Every time a change in code is
made a new result is created.
Editor: here we write the code to be executed like,
Data main;
x=1;
run;
X
1
Log: it shows the details of what has been done in order to execute the code
and what the observation no. etc details are.
Lesson 2:
http://www.youtube.com/watch?v=MFd7wRoH5xw
SAS Code:
The above code is simple one as in one value for one variable. But if we can
more than one variable then use code:
Data main;
Input x y z;
describe.
Cards;
values for these variables
246
789
;
Run;
Proc print data=main;
Run;
Lesson 3:
https://www.youtube.com/watch?v=4UaAlL7VwY0
Importing External Data:
1. From Text file:
/* TEMPLATED CODE: .txt file type, with or without delimiters */
data [appropriate data set name here];
infile "[your file location here, including .txt extension]"
LRECL=[a logical length of your data to emcompass ENTIRE data]
*for eg, in above code the length is 5 (x space y space z) but we
write 100 and depending on the data you can write.
DLM=',';
*DLM=Deliminator like comma space etc to be used in the data like
x y z or x,y,z so like wise we can include comma or space we used
space.
input
[variable names here];
run;
We used:
data infile_main;
infile "C:\My SAS Files\main.txt"
LRECL=100
DLM= ;
input x y z;
run;
But there will an error as the SAS will start reading from the top and the top
line represents the variables so to overcome we include in DLM a line i.e.:
DLM= firstobs=2;
And now the code will run as it tells SAS to start reading from second line.
2. From SAS data set:
As told earlier that the current assignments are stored in the work libraries.
But if we want we can store our code in another library by using following
code:
run;
We Used:
proc import out=imported_excel
datafile='C:\My SAS Files\main.xls'
dbms=excel replace;
*Optional statements are below;
sheet='Sheet1';
getnames=yes;
run;
*Proc import can be used to import access files, csv, or dbf etc. you only
have to change the statement dbms=csv; or dbms=access; and so.
Helpful Notes:
1. The LIBNAME statement is used to point SAS towards a specific folder on
your computer.
2. The INFILE statement "reads" data into SAS if it is of a certain format
(usually comma, space, or tab delimited).
3. PROC IMPORT - imports data of any of several different file formats into
SAS.
4. The SET statement imports data from a library into SAS at the DATA STEP.
5. The library name in a data step's data name "writes" data from SAS into
your library folder using SAS's own file format system.
Lesson 4:
https://www.youtube.com/watch?v=zlDMwF3kQ6s
Merging data Sets:
If we have following two codes;
Data main;
Input x y z;
Cards;
123
789
;
Run;
&
Data more_people;
Input x y z;
cards;
123
456
;
run;
Now if we want to merge these two then we have to create new data set
using code:
/* 1. Use one SET statement when you have the same variables, but
different observations */
data final_one;
set main more_people;
run;
And by selecting all 3 codes and running print statement:
X
1
7
1
4
Y
2
8
2
5
Z
3
9
3
6
/* 2. Use two SET statements when you have different variables, but
the same observations */
First data set is main and now second is more_vars with different variables:
data more_vars;
input a b c;
cards;
20 40 60
10 20 30
;
run;
To merge both these main & more_vars we use code with 2 set statement:
data new_final;
set main;
set more_vars;
run;
Now after by selecting all 3 codes and running print statement:
proc print data=new_final;
run;
We get:
Obs X Y Z A
1
2
1 2 3 20 40 60
7 8 9 10 20 30
run;
We get:
Obs
1
2
3
4
X
1
2
3
7
Y
2
.
.
8
Z
3
.
.
9
A
20
11
14
10
B
40
12
15
20
C
60
13
16
30
run;
We get:
Obs X Y Z A B C
1
1 2 3 20 40 60
2
7 8 9 10 20 30
Helpful Notes:
1. Use one SET statement when you have the same variables, but different
observations.
2. Use two SET statements when you have different variables, but the same
observations.
3. Use the MERGE statement when you have a common index variable, and
any new variables or observations.
4. The MERGE statement first requires that you use the SORT procedure
(PROC SORT) to sort on the index variable before merging.
5. Make sure that you add the BY statement after the MERGE statement in
your DATA step or you will have a new dataset that is merged incorrectly.
6. PROC SQL is an advanced method of merging data that can be very
powerful for large datasets. It uses different kinds of "JOINS" that I will
provide more information on in a later video.
Lesson 5:
https://www.youtube.com/watch?v=Jj8WOndCNC8
Data Reduction and Data Cleaning:
Main Code:
data main;
input x y z;
cards;
123
789
;
run;
proc contents data=main; run;
proc print data=main; run;
statement */
proc print data=main;
where x = 1;
run;
proc print data=main;
run;
proc print data=reduced_main;
run;
The output will be:
Obs X Y Z
1
1 2 3
/* 3. Reduce data in the DATA STEP by KEEPing only the variables
you do want */
data reduced_main;
set main;
KEEP x y;
run;
[ it shows variables as ID
ID = "Identification Number"
month = "Month of the Year"
day = "Day of the Year";
run;
proc contents data=main;
called label with each
run;
proc contents data=clean_main;
mentioned label in code]
run;
So if we run code:
proc freq data=clean_main;
table month;
run;
We get:
Months
etc
2
8
etc
1
1
data clean_main;
[here we are not changing the value of 1 to
January only formatting it]
set clean_main;
format month months.; [Here month is the variable and months.(dot)
is the value of format]
run;
Now after running code:
proc freq data=clean_main;
table month;
run;
We get:
Months
etc
Cumulative
etc
February (formatted)
August (formatted
1
1
Lesson 6:
https://www.youtube.com/watch?v=p_a0WP74lCQ
Main Code:
data main;
input x y;
cards;
12
34
56
78
;
run;
proc print data=main;
run;
Output is:
Obs X Y
1
1 2
3 4
5 6
7 8
Now to create 3rd variable and perform arithmetic operations, for this we
write code as:
data new_main;
set main;
a = x + y;
b = x - y;
c = x * y;
d = x / y;
e = x ** y;
f = ((x + y) * (x - y));
run;
addition
subtraction
Multiplication
Division
Exponential(X ki power Y)
Variables
6
7
d
e
Here the variables are sorted alphabetical wise but if we use code:
Variables
6
7
d
e
run;
Output:
Obs
1
2
3
4
X
1
3
5
7
Y
2
4
6
8
A
3
7
11
15
B
-1
-1
-1
-1
C
2
12
30
56
D
0.5
0.75
0.8333
0.875
E
F
1
-3
81
-7
15625
-11
58974641 -15
Helpful Notes:
1. SAS uses many of the same arithmetic operators to add, subtract, divide
and multiply as other programming languages and basic algebra.
2. Arithmetic operations on variables affect the entire list of observations. So
be careful in operating with existing variables and make new variables if you
can afford to.
3. The varnum option on the PROC CONTENTS statement can allow you to
see the variables listed in the order they were created.
Lesson 7:
https://www.youtube.com/watch?v=UsbDpmUQG9g
Today's Code:
*This example, taken from Huntsberger and Billingsley (1989, p. 290), tests
whether
the mean length of a certain type of court case is more than 80 days by
using 20 randomly chosen cases.;
data time;
input time @@;
cards;
43 90 84 87 116 95 86 99 93 92
121 71 66 98 79 102 60 112 105 98
;
run;
* What does the data look like?;
Output:
Obs Time
1
43
2
90
84
87
116
95
86
99
93
10 92
11 121
12 71
13 66
14 98
15
79
16 102
17
60
18
112
19
105
20
98
****************************;
* 1. In Statistics, we must specify the level of significance (alpha).
* 2. Roughly, you can interpret this as "This conclusion did not occur by
chance."
* 3. The flip-side of level of significance is called confidence (1-alpha).
* 4. It's common practice to use an alpha-value of 5% so we can be "95%
confident" of our results.
RUN;
[And PLOTS(SHOWH0) option means that we want to see
normal distribution]
ODS GRAPHICS OFF;
[Graphics off]
*Now in the results window under the variable time there are two more
option showing the graphics of the statistics
The code:
the h0= and sides= options on the PROC statement allow you to specifically
control what you are testing.
3. Specify the level of significance (alpha) using the alpha= option on the
PROC statement.
4. To graphically visualize the data and assess the normality assumption, use
the ODS graphics on and off statements above and below the PROC step to
show the histogram and QQ plots.
5. Lastly, formal normality assumptions (the better route) can be done using
the Anderson-Darling test in PROC UNIVARIATE. Remember that the null
hypothesis is essentially that the data is normally distributed, so you'll want
to have a non-significant p-value to know the data is normal.
Lesson 8:
https://www.youtube.com/watch?v=ULq9kQtIDMQ
Today's Code:
/* I. PAIRED t TEST EXAMPLE */
/* In this example, taken from the SUGI Supplemental Library User's Guide,
Version 5 Edition, a stimulus is being examined to determine its effect on
systolic blood pressure. Twelve men participate in the study. Each man's
systolic blood pressure is measured both before and after the stimulus is
applied.
*/
data pressure;
input SBPbefore SBPafter @@;
datalines;
120 128 124 131 130 131 118 127
140 132 128 125 140 141 135 137
120
124
128
131
130
131
4
5
118
140
127
132
128
125
140
141
8
9
135
126
137
118
10
130
132
11
126
129
12
127
135
Output:
You will get the results and see the p value in order to accept or reject the
hypothesis at 5% level of significance.
Output:
Here the histogram and the QQ plot shows the data to be normal and it
should be done to check as this test is done only on normal data.
* Test for Normality of Each Variable Separately;
Output:
/* In the following example, the golf scores for males and females in a
physical education class are compared. The sample sizes from each
population are equal, but this is not required for further analysis. The scores
are thought to be approximately normally distributed within gender.*/
data scores;
input Gender $ Score @@;
cards;
f 75 f 76 f 80 f 77 f 80 f 77 f 73
m 82 m 80 m 85 m 85 m 78 m 87 m 82
;
run;
The Output:
75
76
80
77
80
77
7
8
f
m
73
82
80
10
85
11
85
12
78
13
87
14
82
Output:
You will get the results and in results there are more subfolders. Here you
find whether the variances of two sample equal or not and on this basis
choose the relevant test and find whether null hypothesis is correct or not by
seeing the p value at 5% level of significance.
Output:
Here the histogram and the QQ plot shows the data to be normal and it
should be done to check as this test is done only on normal data.
Output:
*Now in the results section under variable time, there will be a subfolder of
histogram under which there will be a normal fit folder and in this click on
goodness of fit. In this goodness of fit SAS uses 3 tests to calculate the
normality of the statistics and in Anderson-darling test see the p value and
compare with to get the result
Helpful Notes:
1. The TTEST procedure requires the data be normally distributed.
2. The PAIRED statement is required when you want to compare two
dependent measurements.
3. The CLASS statement is used when you want to compare measurements
of a variable from two different groups (e.g. gender differences).
Lesson 9:
https://www.youtube.com/watch?v=bEXKgIJSfzg&list=PL7CB9B66A2F4FB9B3
Macro Coding and Macro Variables:
Today's Code:
data main;
input ID var1 var2;
cards;
123
245
367
489
;
run;
proc contents data=main; run;
2 4
3 6
4 8
3
5
7
9
%let newvar = var3; [% sign means macro & it means SAS can call new
variable var3 anytime or we can call it a global variable]
/* 2. Use the & operator to call a macro variable */
data new_main;
set main;
&newvar = var1+var2;
macro way of doing]
run;
2 4
3 6
4 8
3 5
5 9
7 13
9 17
%MACRO transform_this(x);
&x._squared = &x ** 2;
inverse of X]
&x._cubed = &x *** 3;
&x._inverse = 1 / &x;
%MEND transform_this;
data newer_main;
set new_main;
%transform_this(var1);
run;
proc print data=newer_main;
run;
Output:
1
2
3
4
2
4
6
8
5
7
9
16
36
64
64
216
512
0.5
0.25
0.1667
0.125
Output:
It shows the contents of the data set main & new_main respectively.
/* 5. Create a macro to run the PRINT procedure on any data set */
%MACRO print_this(data_set);
proc print data=&data_set;
run;
%MEND print_this;
%Print_this(main);
% Print_this (newer_main);
Output:
2 4
3 6
4 8
3
5
7
9
&
1
2
3
4
2
4
6
8
5
7
9
16
36
64
64
216
512
0.5
0.25
0.1667
0.125
Helpful Notes:
1. There are two places one can use macro variables: within a macro, and
globally outside of all STEPS.
2. The ampersand operator: &, defines a macro variable within a macro and
is used to call macro variables anywhere.
3. The %let statement allows you to define macro variables outside of a
macro, though the & operator still must be used to call the macro variable
elsewhere.
4. The MACRO statement begins the definition of a macro and the MEND
ends the definition.
5. It is optional to restate the macro name after the MEND statement.
6. A macro can be thought of as a function, where one passes something into
the function and certain things are returned. Unlike a function, however,
macros do not always have to return something.
Lesson 10:
https://www.youtube.com/watch?v=XCtJdcpImo&index=12&list=PL7CB9B66A2F4FB9B3
Helpful Notes:
1. Enhanced Editor configuration
General:
Change the Tabs to 2 spaces and replace tabs with spaces.
This can help you organize the code as you write it out.
Appearance:
File Elements - they can be re-colored to suit your style.
2. Function Keys and using them. A discussion on the "Run" button (F8).
3. Using keys to navigate within SAS
F5 is program
F6 is log
F7 is output
I would remap those so they're in order left to right.
4. Preferences:
Specifically, check out the Results section
SAS 9.3 changes the default from LISTING type to HTML
You can change this back to LISTING and SAS 9.3 should operate as it did in
SAS 9.2 and previous versions.
There is one benefit of using SAS 9.3's HTML style, in that you can directly
export specific output tables to Excel or other supported systems such as
Microsoft OneNote. I don't use SAS 9.3 so I can't show this.