Escolar Documentos
Profissional Documentos
Cultura Documentos
Cognizant 500 Glen Pointe Center West Teaneck, NJ 07666 Ph: 201-801-0233 www.cognizant.com
Handout - SAS
TABLE OF CONTENTS
Introduction ...................................................................................................................................7 About this Module .........................................................................................................................7 Target Audience ...........................................................................................................................7 Module Objectives ........................................................................................................................7 Pre-requisite .................................................................................................................................7 Session 02: Introduction to SAS / Getting Started .....................................................................8 Learning Objectives ......................................................................................................................8 Introduction to SAS Programming Language ...............................................................................8 BASE SAS Software .....................................................................................................................9 Why SAS? ....................................................................................................................................9 Multi Vendor Architecture (MVA) ................................................................................................10 Applications ................................................................................................................................10 Overview of SAS Products .........................................................................................................10 Getting Started............................................................................................................................12 Steps of a SAS Program ............................................................................................................13 DATA Step vs. PROC Step ........................................................................................................14 Flow Diagram of a SAS Program ...............................................................................................14 Data types in SAS.......................................................................................................................15 Summary ....................................................................................................................................15 Test your Understanding ............................................................................................................15 Session 03: Getting Started.........................................................................................................16 Learning Objectives ....................................................................................................................16 Missing Value Representation in SAS ........................................................................................16 SAS Programming Rules............................................................................................................17 Rules for Creating Variable Names ............................................................................................17 My First SAS Program ................................................................................................................17 SAS Windowing Environment.....................................................................................................18 Try It Out .....................................................................................................................................22 Summary ....................................................................................................................................22 Test your Understanding ............................................................................................................23 Session 04: Basic Concepts .......................................................................................................24 Learning Objectives ....................................................................................................................24
Page 2 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
_N_ & _ERROR_ ........................................................................................................................24 Program Data Vector (PDV) .......................................................................................................24 DATA Step's Built-in Observation Loop......................................................................................27 SAS Program Flow of Execution ................................................................................................27 Reading from External File .........................................................................................................30 Try It Out .....................................................................................................................................35 Summary ....................................................................................................................................36 Test your Understanding ............................................................................................................36 Session 05: Basic Concepts/Working with the DATA Step .....................................................37 Learning Objectives ....................................................................................................................37 Variable Declaration ...................................................................................................................37 Reading same record more than once .......................................................................................38 Scope of DATA and PROC Steps ..............................................................................................39 Operators in SAS ........................................................................................................................40 Commenting in SAS ...................................................................................................................42 SAS Data Libraries .....................................................................................................................42 Reading a SAS Dataset ..............................................................................................................44 Try It Out .....................................................................................................................................46 Summary ....................................................................................................................................47 Test your Understanding ............................................................................................................48 Session 07: Working with the DATA step ..................................................................................49 Learning Objectives ....................................................................................................................49 Dataset Options and Options Statement ....................................................................................49 SAS Informats & Formats ...........................................................................................................50 Working with SAS Date and Time ..............................................................................................52 Styles of input .............................................................................................................................54 Writing to an external file ............................................................................................................56 Try It Out .....................................................................................................................................58 Summary ....................................................................................................................................60 Test your Understanding ............................................................................................................60 Session 09: SAS Procedures ......................................................................................................61 Learning Objectives ....................................................................................................................61 SAS Procedures .........................................................................................................................61 PROC PRINT..............................................................................................................................61 PROC CONTENTS.....................................................................................................................63 PROC SORT ..............................................................................................................................65
Page 3 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
PROC FORMAT .........................................................................................................................65 PROC DATASETS .....................................................................................................................66 Try It Out .....................................................................................................................................69 Summary ....................................................................................................................................70 Test your Understanding ............................................................................................................70 Session 11: SAS Programming Concepts .................................................................................71 Learning Objectives ....................................................................................................................71 Retaining Variable Values ..........................................................................................................71 Automatic Variables ....................................................................................................................72 Titles and Footnotes ...................................................................................................................74 Conditional Processing ...............................................................................................................75 Iterative Processing ....................................................................................................................77 Conditional Iterative Processing: ................................................................................................78 Other Data Step statements .......................................................................................................80 Try It Out .....................................................................................................................................81 Summary ....................................................................................................................................83 Test your Understanding ............................................................................................................83 Session 13: SAS Programming Concepts/Built-in Functions in SAS .....................................84 Learning Objectives ....................................................................................................................84 SAS ODS ....................................................................................................................................84 Arrays in SAS .............................................................................................................................85 Arithmetic Functions ...................................................................................................................87 String Functions ..........................................................................................................................90 Try It Out .....................................................................................................................................98 Summary ..................................................................................................................................102 Test your Understanding ..........................................................................................................102 Session 16: Built-in Functions in SAS / Merging and Combining SAS Data Sets ...............104 Learning Objectives ..................................................................................................................104 Date Time Functions ................................................................................................................104 Combining Vertically .................................................................................................................108 Concatenating...........................................................................................................................109 Interleaving ...............................................................................................................................109 Combining Horizontally .............................................................................................................110 One-to-one reading ..................................................................................................................110 One-to-one merging .................................................................................................................111 Match merging ..........................................................................................................................111
Page 4 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Updating ...................................................................................................................................112 Performing JOINS in DATA Step..............................................................................................113 Try It Out ...................................................................................................................................114 Summary ..................................................................................................................................117 Test your Understanding ..........................................................................................................117 Session 18: Statistical Procedures...........................................................................................118 Learning Objectives ..................................................................................................................118 PROC FREQ ............................................................................................................................118 Multi-Threaded Processing .......................................................................................................120 PROC MEANS..........................................................................................................................121 PROC SUMMARY ....................................................................................................................124 PROC REPORT .......................................................................................................................124 Try It Out ...................................................................................................................................127 Summary ..................................................................................................................................130 Test your Understanding ..........................................................................................................130 Session 20: PROC SQL ..............................................................................................................131 Learning Objectives ..................................................................................................................131 PROC SQL Basics ...................................................................................................................131 The SELECT Statement and its Clauses .................................................................................132 Creating Output Tables ............................................................................................................133 Summarizing & Grouping Data .................................................................................................134 Querying Multiple Tables ..........................................................................................................134 Limiting no of rows to be read and displayed ...........................................................................135 Using Operators in PROC SQL ................................................................................................135 Calculated Values .....................................................................................................................136 Enhancing Query Output ..........................................................................................................137 CONCLUSION ..........................................................................................................................139 Try It Out ...................................................................................................................................139 Summary ..................................................................................................................................140 Test your Understanding ..........................................................................................................141 Session 22: Introduction to MACROS ......................................................................................142 Learning Objectives ..................................................................................................................142 SAS Macro................................................................................................................................142 Advantages of the SAS Macro Facility .....................................................................................142 Macro variables ........................................................................................................................143 Automatic and User defined macro variables...........................................................................145
Page 5 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Macro Processor and the flow of execution .............................................................................145 Creating macro variables in run time ........................................................................................147 Try It Out ...................................................................................................................................149 Summary ..................................................................................................................................151 Test your Understanding ..........................................................................................................151 Session 23: Introduction to MACROS ......................................................................................152 Learning Objectives ..................................................................................................................152 Macro Programs .......................................................................................................................152 Using Macro Parameters ..........................................................................................................153 Scope of Macro variables .........................................................................................................154 System Options ........................................................................................................................155 Condition execution in Macro ...................................................................................................158 Iterative processing in Macro....................................................................................................159 Built-in Macro Functions ...........................................................................................................159 Try It Out ...................................................................................................................................161 Summary ..................................................................................................................................162 Test your Understanding ..........................................................................................................162 Session 25: Help on SAS ...........................................................................................................163 Learning Objectives ..................................................................................................................163 Debugging SAS Programs .......................................................................................................163 Creating Efficient SAS Codes...................................................................................................166 Summary ..................................................................................................................................171 Test your Understanding ..........................................................................................................171 References ..................................................................................................................................172 Websites ...................................................................................................................................172 Books ........................................................................................................................................172 STUDENT NOTES: ......................................................................................................................173
Page 6 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Introduction
About this Module
This handout document Introduces the SAS programming language Explains the basic concepts in BASE SAS Touches the advanced concepts in BASE SAS
Target Audience
Entry Level Trainees
Module Objectives
After completing this module, you will be able to: Explain the SAS language Describe the basic concepts in SAS Work with the DATA step Explain procedures in SAS Explain SAS programming concepts Describe built-in functions in SAS Work with SAS Data Sets Work with statistical procedures Work with PROC SQL Describe MACROS
Pre-requisite
The trainee needs to have basic knowledge in programming language
Page 7 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Page 8 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The descriptions about the above tasks are given below You can Access data from almost any source and in any format. o You can read and write data from text file or CSV (comma-separated-values) file to powerful database like Oracle, DB2, etc. Manage the contents of the data o SAS manages the contents of the data and stores them in a special form called SAS Dataset. You can use the SAS programming language or the built-in programs (Procedures) to perform different kind of analysis on the data.
Present the analyzed reports in a variety of formats o Finally you can present the analyzed reports in a variety of formats including text or graphical format Many software applications are either totally menu driven, or totally command driven (enter a command -see the result). Base SAS software is neither totally menu driven nor totally command driven. With Base SAS software, you use statements to write a series of instructions called a SAS program, which communicates with the SAS system. This module introduces Base SAS software programming concepts.
Why SAS?
SAS System enables you to access data in almost any format no matter where or how they are physically stored. Can access data stored on different data bases as well as data on different computers - through Engines. Can use its data management facility to update, combine, rearrange, edit or subset data before analysis Its power, flexibility & ease of use enable you to gain strategic control of all your data processing needs. SAS System has a collection of ready-to-use programs called procedures for analyzing and presenting the data in a variety of formats according to the users requirement.
Page 9 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Also it has many statistical procedures for performing statistical analysis. It provides an exhaustive inventory of application development tools.
Applications
Applications of SAS are diverse. Some of the fields where SAS finds its applications are given below, Application in the field of Data Warehousing and Data Mining Widely used in Clinical research/trials in developing and testing of drugs. Also used in the fields of Banking, pharmaceuticals. Statistical and mathematical analysis Business forecasting and decision support Operations research and project management Report writing and graphics Applications development SAS Systems analysis tools range from simple statistics to specialized analysis For econometrics and forecasting, statistical design, and Operation Research.
Page 10 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS/AF Allows you to write your own interactive SAS applications. Applications written with SAS/AF software allow users quick-and-easy access to information without knowing the SAS language. SAS/ASSIST Is a menu-driven front end to SAS software. You make choices from menus, and SAS writes the program for you. Programs can be stored for later use. SAS/CONNECT Connects computers running SAS software. Data can be shared between the computers, and programs developed on one computer or operating environment can be transferred to another for processing. SAS Enterprise Guide Providing a graphical user interface to power SAS. This is a Windows only product, but can be sed to access SAS servers on other systems. SAS Enterprise Miner A data mining tool and it is a complete product in itself. It provides an easy-to-use front-end to the SEMMA (Sample, Explore, Modify, Model, Assess) process for business users. SAS/GRAPH Produces high-resolution plots, charts, and maps. SAS/MDDB Server Allows you to save data in multidimensional database (MDDB) formats for use with online analytical processing (OLAP) (otherwise known as slicing and dicing your data). SAS/STAT Statistical analysis with a number of procedures, providing statistical information such as analysis of variance, regression, multivariate analysis, and categorical data analysis. SAS/Warehouse Administrator Simplifies the creation and maintenance of data warehouses. SAS Enterprise Business Intelligence Server Includes both a suite of business intelligence (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with the popular reporting tools like Business Objects and Cognos. SAS Business Intelligence gives you the information when you need it, in the format you need. The SAS Difference Other vendors provide business intelligence solely in the form of historical reports that give you hindsight but limited insight. SAS Business Intelligence allows you to understand the past, monitor the present and predict outcomes as you move your business ahead.
Page 11 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS/ETL (Extraction, Transformation and Loading): Extract, cleanse, transform, load and manage data from a single environment SAS provides integrated ETL capabilities that enable organizations to extract transform and load data from across the enterprise to create consistent, accurate information. SAS is a modular product. That is, it requires a number of modules to run, such as BASE SAS. However, after the BASE SAS module is installed, you have the choice to add whatever additional modules to add functionality to SAS. For example, SAS/STAT module adds the capability for statistical analysis. SAS/GRAPH adds the capability for high-resolution graphics and so forth.
Getting Started
SAS Datasets SAS own way of storing the data Before you can analyse your data and produce a report with SAS software, the data must be in a special form the SAS system can understand. This form is called SAS data set. It consists of two portions: Descriptor Information Data Values
Descriptor Information:
The Descriptor information describes the contents of the SAS dataset to the SAS system. It contains the information like: Dataset name Date created/modified Version no of the SAS system No of variables & Observations Info about each variable Variable name/data type/ length/position within the dataset and etc.
Data Values:
The Data values or the Data portion contains the actual data that have been collected. The data is organized into a rectangular structure containing rows called observations and columns called variables. An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.
Page 12 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
DATA Step:
DATA step reads data from any source. Using Data step you can read data from text or csv file to databases like Oracle, DB2, etc. Combine existing SAS datasets in a DATA step You can transform and analyze the data Write programming statements to modify the data Finally you can write-out the processed data to a SAS Dataset or an external file
PROC Step:
PROC stands for Procedure step. The PROC step recognizes only SAS datasets and not other files. It takes a SAS dataset, analyze the data and generate results / reports. It can also produce the results in graphical form like Graphs / Charts The results can be written to an Output SAS Dataset as well. There can be any number of DATA or PROC steps in a SAS program A typical program starts with DATA step to create a SAS data set and then passes the dataset to a PROC step for processing. Here is a simple program that converts miles to kilometers in a DATA step and prints the results with a PROC step:
Page 13 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
RAW data is given an input to the SAS DATA step DATA step reads the data using SAS statements and creates a SAS Dataset as output The created SAS Dataset is given as input to the SAS Procedure step The PROC step generate the Reports
Page 14 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Summary
SAS provides a Complete Application Development Environment to cater to the four basic tasks: ACCESS, MANAGE, ANALYZE & PRESENT Data The SAS system is an integrated system of software products and the core of the SAS System is BASE SAS software. MVA makes SAS Platform Independent. It facilitates applications that run on more than one computing environment. SAS is used in almost all the fields. SAS licenses many different products. And most of the products are integrated, so you don't have to convert datasets (data) or start up another program to use the other products. Base SAS is the core software. SAS Datasets is SAS own way of storing the data. SAS Datasets consists of two portions: Descriptor Information & Data Values. SAS programs are constructed from two basic building blocks DATA step & PROC step. There are only two data types available in SAS, Numeric & Character
Page 15 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
GENDER
Page 16 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
My First SAS Program DATA EMP; INPUT EMPID NAME $ SAL ; OUTPUT EMP; DATALINES; 111 RAMESH 1000 222 KUMAR 2000 333 RANI 3000 ; RUN; PROC PRINT DATA = EMP; RUN;
Page 17 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Explanation: INPUT Statement: The INPUT statement reads data lines (observation) and assigns values to the SAS variables that correspond to the data fields. Since Name is a character variable it is followed by $. DATALINES statement: Use the DATALINES statement with an INPUT statement to read data entered in the program rather than from an external file. The DATALINES statement indicates the end of the DATA step and the beginning of the input data values. DATALINES assumes that the data follows immediately, that is, the data is 'instream' or within the program. You can also use CARDS statement instead of DATALINES. The functionality of both statements are same. Guidelines: 1. Must be the last statement in the DATA step (that is, place the DATALINES statement directly before the first data line.) When the compiler comes across the statement DATALINES; then it reads subsequent lines as data rather than source code. 2. Terminate the data with a semicolon in a new line. OUTPUT Statement Writes the value of the variables EMPID, NAME and SAL to the Dataset EMP PROC PRINT Procedure PRINT procedure prints the contents (data portion) of the SAS dataset
Program Editor
We can use program editor window to enter, edit, and submit SAS programs SAS color codes different parts of the program. Extension of the SAS program is .sas
Page 18 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Log window
The Log window displays: Messages about the SAS session How the SAS program was executed Notes, errors, or warnings thrown during the execution of a SAS program Time taken by SAS system to process the program Extension of the log file is .log
Output window
The Output window displays the output of the SAS programs that we submit. Extension of the output file is .lst
Page 19 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
If we create HTML output, it will be opened in Results Viewer window, which is the internal browser for SAS.
Explorer window
Explorer Window gives easy access to the SAS files and libraries. Use this window to: View and manage SAS files create new SAS libraries and SAS files open any SAS file perform most file management tasks such as moving, copying, and deleting files Create file shortcuts
Page 20 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Results window:
Table of contents for your Output window. The result tree lists each part of your results in an outline form. It helps us to navigate and manage output from SAS programs that we submit. We can view, save, and print individual items of output.
Page 21 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Code DATA ALL; INPUT ID $ HR SBP DBP; OUTPUT ALL; DATALINES; A1 68 130 80 B3 101 148 86 C2 . . 72 D1 72 140 88 ; RUN;
Refer File Name: 3.1.sas to obtain soft copy of the program code
How It Works
INPUT statement reads the values of ID, HR, SBP & DBP respectively and stores it in the PDV. OUTPUT statement writes the values of the user-defined variables to the dataset.
Summary
Character missing value is represented by spaces Numeric missing value is represented by a period SAS Programming Rules Rules for creating Variable Names SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks we need to do.
Page 22 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Page 23 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
_N_:
The _N_ variable counts the number of times the DATA step begins to iterate. Initially it is set to 1 and for each iteration it is incremented by 1. It behaves like a record counter, that is, while reading the first record _N_ is set to 1, while reading the nth record _N_ is set to n and so on. DATA step's Iteration No 1 2 10 n _N_ value 1 2 10 n
_ERROR_:
The _ERROR_ variable signals the occurrence of an error caused by the data during execution. By default it is set to 0, if any error occurs, this is set to 1. For Example: If any data error occurs, _ERROR_ is set to 1. Data error occurs when you try to assign a character value in a numeric variable.
Page 24 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Initially all the values of variables are set to missing except for _N_ & _ERROR_. _N_ is set to 1 and _ERROR_ is set to 0 initially. All variables are marked as either KEEP or DROP. The automatic variables _N_ & _ERROR_ are always dropped, so they will not be written to the output dataset. When the program encounters the Output statement or when the scope of the data set is reached: All values in PDV, except those marked to be dropped, are written as a single observation to the output data set. System returns to the Data statement to begin the next iteration All the variables are reset to missing except _N_ & _ERROR_. _N_ is incremented by 1. Input Buffer: If the program is reading from an external source, then SAS creates a temporary buffer space called Input buffer. Input buffer holds the current record being processed. Its default length is 256 characters and can be changed using the option LRECL. Understanding the PDV: Consider the below program
DATA EMP; INPUT EMPID NAME $ SAL ; NEWSAL = SAL + 100; OUTPUT EMP; DATALINES; 111 RAMESH 1000 222 KUMAR 2000 333 RANI 3000 ; RUN;
When the above program is submitted, SAS allocates memory for Input buffer and PDV.
Page 25 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Step 1: During the program execution, SAS reads the first record and stores it in the Input buffer. The Input pointer is positioned at the beginning of the Input buffer. The following figure shows the position of the input pointer in the input buffer before SAS reads the data.
Step 2: The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values. After reading the first value the input pointer moves to the beginning of next value in the input buffer, from there the INPUT statement reads the value for second variable and so on. The below figure illustrates the process.
Step 3: After the INPUT statement reads a value for each variable, the next statement is executed. SAS computes a value for the variable NEWSAL from SAL and writes it to the PDV. All the programming statements read and write the values of variables from the PDV.
Page 26 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Step 4: When SAS encounters the OUTPUT statement or when it executes the last statement in the current DATA step, all the values in the PDV except those marked as DROP are written as single observation to the dataset EMP. Step 5: Before reading the next record all the variables are set to missing except the automatic variables _N_ and _ERROR_. _N_ is incremented by 1, so it becomes 2 _ERROR_ is 0, since no error was occurred.
Page 27 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
During the compilation phase: SAS checks the syntax of the SAS statements Establishes an area of memory called input buffer, if reading an external source/file. It allocates the memory for Program Data Vector (PDV) Assigns required attributes to variables like, its data type, length, position, etc., Builds the descriptor portion of the new dataset. Converts SAS Code into uppercase.
Execution:
Page 28 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. SAS sets the variables to missing in the program data vector (PDV). SAS reads a data record from a raw data file into the input buffer and then stores in the PDV. SAS executes any subsequent programming statements for the current record. When it encounters a OUTPUT statement or at the end of the DATA step, SAS writes an observation to the SAS data set The system automatically returns to the top of the DATA step. The same steps continue until there is no record to be read Control flow in DATA step:
During the compilation time, SAS builds the descriptor portion of the Dataset EMP. At the beginning of the execution: SAS reads the first observation from the raw file. The observation passes through every observation in the DATA step. When SAS encounters an Output statement or when the scope of the DATA step is reached the values of the variables are written to the Dataset EMP as observation one. When it reaches the RUN statement, the control goes back to the beginning of the DATA step for reading the subsequent observations. SAS reinitializes the PDV Now it checks whether a record is available to read. Since it is available SAS reads the second record and the record follows the same step as mentioned above. Similarly SAS reads the third record and write it to the Dataset. Then SAS checks the availability of the next record in the Input file. Since it is not available, SAS terminates the current DATA step and the control comes after the DATA step for executing the other steps.
Page 29 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
raw-data-file - points to the raw data file being read options - affect how SAS reads the raw data file
Example:
FILENAME statement
General form:
FILENAME INP C:\SAS\SASFILES\FILE.TXT; DATA TEMP; INFILE INP; INPUT NAME $ SAL; RUN;
The following options of the INFILE statement affect how the data is read from the external file:
Page 30 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
111, ,1000
Enables SAS to read values with embedded delimiters if the value is surrounded by double quotes. Example: Consider the value for Name in the first observation is,
111,RAMESH, KUMAR,1000
Page 31 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Is read with the INPUT statement, then the value of variables would be,
111, RAMESH, KUMAR,1000 EMPID = 111 NAME = RAMESH, KUMAR SAL = 1000 END=
The END= option creates and names a temporary variable that acts as an end-of-file indicator. General Form:
RECFM
Specifies the record format of the external file. Usually, the SAS System reads a line of data until a carriage return is encountered. However, sometimes more than one fixed-length record (records with same LRECL) occurs in a single line without carriage return characters. In this case, the option RECFM=F (fixed) needs to be specified to read the data. The default value is RECFM=V (variable) and it considers one record per line.
Page 32 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Here, Each line should contain 4 data values Last and First names, Employee ID and Job Code. The grayed-out area denotes actual line lengths. Program:
DATA Test; INFILE "d:\infile\emplist.dat" <OPTIONS>; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
The code was submitted using different options on the INFILE statement.
FLOWOVER
Causes the INPUT statement to jump to the next record if it doesnt find values for all variables in the current record/line. This is the default option. Program:
DATA Test; INFILE "d:\infile\emplist.dat" FLOWOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
Contents of the Dataset Test (using FLOWOVER):
Page 33 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The INPUT statement is expecting the data for Jobcode in the positions 37-45, but the datavalue in the 2nd record is only till column 41.So the data is considered as incomplete and the INPUT statement goes to the next record and takes SMITH as the value for Jobcode.
MISSOVER
If SAS reaches the end of the line without finding values for all fields, variables without values are set to missing. Program:
DATA Test; INFILE "d:\infile\emplist.dat" MISSOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
Contents of the Dataset Test (using MISSOVER):
The value of Job code in the 2nd record is only 5 chars, but the program is expecting 9 chars. Since the value of Jobcode in the 2nd record is incomplete, SAS assigns a missing value.
TRUNCOVER
This option acts similar to the MISSOVER Also it takes partial values to fill the first unfilled variable. Program:
DATA Test; INFILE "d:\infile\emplist.dat" TRUNCOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
Page 34 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Contents of the Dataset Test (using TRUNCOVER):
With the TRUCOVER option is place, SAS reads all the columns and all the Observations correctly. The value of Job code in the 2nd record is only 5 chars, but the program is expecting 9 chars. So the data is incomplete. In this case: MISSOVER assigns a missing value to Jobcode TRUNCOVER assigns partial value(only 5 chars) to the unfilled variable Jobcode
Page 35 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Code FILENAME INP C:\SASFILES\VITAL.CSV ; DATA ALL; INFILE INP DLM=','; INPUT ID $ HR SBP DBP; OUTPUT ALL; RUN;
Refer File Name: 4.1.sas to obtain soft copy of the program code
How It Works
This is similar to the problem 3.1 It uses a FILENAME statement that creates file reference to the external file VITAL.CSV Since the input file is a csv (comma-separated-value) file, we are using the DLM option.
Summary
_N_ & _ERROR_ are automatic variables created by SAS during program execution. PDV is a temporary memory area where the values of variables are stored during execution time The code inside the data step is repeated to read from multiple records. The iteration continues until it reaches the End of File. The SAS System processes the DATA step in two phases, compilation & execution phase An INFILE statement is used to specify the source of data read by the INPUT statement. The options of the INFILE statement affect how the data is read from the external file.
Page 36 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Variable Declaration
Variables can be declared using any of the following statements: LENGTH ATTRIB
LENGTH:
If not specified, SAS assigns a default length of 8 bytes to Character and Numeric Variables. Using LENGTH statement, we can explicitly assign the length and data type of variables. General Form:
ATTRIB:
Using ATTRIB statement, we can associate the following attributes to variables in a single statement. Length & Data type Label Informat Format General Form:
Page 37 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Attributes:
LENGTH=<$>length
specifies the length of variable. $ indicates, it is a character variable.
LABEL='label'
Associates a label with a variable.
INFORMAT=informat
associates an informat with a variable
FORMAT=format
associates a format with a variable Example:
Single Trailing @
The single trailing @ option holds a raw data record in the input buffer until, SAS executes an INPUT statement with no trailing @ or it reaches the bottom of the DATA step.
Page 38 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Page 39 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Presence of an explicit OUTPUT statement turns-off the implicit one. Fig 1: Observations are written to both the datasets DATA1 and DATA2 Fig 2: Since an explicit OUTPUT statement is present, observations are not written to the dataset DATA2.
Operators in SAS
Operators in SAS are classified into Arithmetic Operators Comparison Operators Logical Operators
Arithmetic Operators
Page 40 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
** Examples of using the IN operator if VNUM in (1,20,55,79,100,500) then TRUE, if the value of the variable VNUM is found in the given list.
Logical Operators
Other Operators: || --Concatenation >< --Minimum <> --Maximum Concatenation (||) Operator: To concatenate two character strings. Ex: name=Jacob || son MIN (> <) and MAX (< >) Operator: To find the minimum or maximum of two values Ex: x=a >< b; /* x returns minimum of a & b */ x=a <> b; /* x returns maximum of a & b */
Page 41 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Commenting in SAS
There are two styles of commenting available in SAS Multi-line commenting Single line commenting
Multi-line commenting:
For multi-line commenting, enclose the comments in between /* and */ Example:
General form of a SAS Data set: SAS Dataset name is of two levels
Page 42 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
WORK is a system defined temporary library. All the datasets stored in WORK library will be deleted at the end of the session. If you want to create a permanent dataset, it needs to be created in a user-defined SAS library. Example: The dataset Admit is stored in the user-defined library Clinic.
LIBNAME Statement
This statement is used to create a user-defined library. General Form: LIBNAME libref 'SAS-data-library'; Where, SAS-data-library - is the path of a directory in a secondary storage device in which, SAS data files are stored. libref - represents a library reference to the above mentioned directory. It creates a logical link (short-cut) to the SAS-data-library Example:
Page 43 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
KEEP / DROP:
By default, SAS will write all variables and observations to the output dataset. Using the Dataset options KEEP & DROP, you can make SAS to write only specific variables or observations to the Output Dataset. KEEP: The KEEP option names variables you want to read from a dataset. Example:
Page 44 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
DATA ALL; SET EMP; IF _N_ >= 100 AND _N_ <= 300 THEN OUTPUT ALL; RUN;
Which method is efficient and why? END =: We can also use the END= <variable> option with the SET statement. WHERE Statement: To filter the observations from the Input Dataset:
Page 45 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
Code DATA EMP; INFILE INP; INPUT RECTYP $ @; IF RECTYP = A THEN DO; INPUT NAME $ ID SAL ; END; ELSE DO; INPUT ID SAL NAME $ ; END; RUN; PROC PRINT DATA = EMP; RUN;
Refer File Name: 5.1.sas to obtain soft copy of the program code
Page 46 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Code LIBNAME LIB C:\SASFILES\; DATA LIB.EMPNEW; SET EMP (DROP = RECTYP); WHERE SAL >= 20000; RUN; PROC PRINT DATA = EMP; RUN;
Refer File Name: 5.2.sas to obtain soft copy of the program code
How It Works
LIBNAME statement creates a permanent library named LIB. DROP option drops the variable RECTYP while reading WHERE statement selects only the observations whose SAL >= 2000
Summary
Variables can be declared using LENGTH or ATTRIB statements Single Trailing @ holds the current record for the next INPUT statement Holds the record until all the values are read from the current record Every DATA step has an implicit OUTPUT statement at the end. Implicit OUTPUT statement will not work if an OUTPUT statement is present. Operators in SAS are classified into Arithmetic, Comparison, Logical and Other operators There are two styles of commenting available in SAS SAS data library is a collection of one or more SAS data files. SET statement is used to read an existing dataset. Options in the SET statement affect how the data is read.
Page 47 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Page 48 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
OPTIONS <options>;
Where, options specifies one or more system options to be changed. Using the options statement you can also control the appearance of the output
Page 49 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
LINESIZE=
The LINESIZE= option specifies how many characters each output line should contain
PAGENO=
By default, page no start at 1 and are numbered sequentially throughout the SAS session. If you want to reset the page no or to start the page no with any other no, use this option. In the following example the output pages are numbered sequentially beginning with number 3 Example:
For example, the numeric value $1,234.56 contains special characters ($ and ,) , so SAS will not recognize it. To read such not standard values we need to use an informat (DOLLAR9.2 in this case) to tell SAS that the input data is in a particular format. Now SAS understands the pattern of the data and converts it into standard numeric value before assigning it to a variable $1,234.56 DOLLAR9.2 1234.56
Page 50 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Few Informats: w.d - Reads standard numeric data $w. - Reads standard character data $CHARw. Reads character data with blanks DOLLARw.d - reads numeric value and removes embedded comma, blanks, dollar sign, percent sign,or right parenthesis COMMAw.d similar to DOLLARw.d Example:
Format
A Format is used to write data in non-standard form. A format is a pattern / instruction that SAS uses to write data values in the output. The General form of Format is same as that of Informat. Name of the Format is also same as that of Informat but the functionality is exactly reverse. For example, to display the value 1234.56 as $1,234.56 in a report, you can use the DOLLAR9.2 format 1234.56 DOLLAR9.2 $1,234.56 Few Formats: w.d Writes standard numeric data $w. - Writes standard character data $CHARw. Writes standard character data. DOLLARw.d Converts standard numeric value to DOLLAR w.d form and prints it in the output/report. COMMAw.d similar to DOLLARw.d format but wont prefix $ sign
Page 51 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Key Concept: Formats alter the external representation of the values of variables stored in SAS data sets. The internal value remains the same, but how we see it, outside of the data set, is controlled by the Format we choose to associate to the variable. Format and Informat statement: Format / Informat Statement is used to associate a format / informat to a variable. General Form:
Format DOB date9. ; Informat SAL COMMA9.2 ; Working with SAS Date and Time
SAS stores the Date and Time values in Numeric form.
Date
SAS system stores the date values by converting dates into integers representing the number of days between January 1, 1960, and a specified date. SAS system can represent the dates between 1582 A.D and 20,000 A.D
Time
SAS System processes time values by converting it to integer representing number of seconds since midnight of the current day. SAS time values are independent of the date.
Page 52 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Displayed Value 01/01/60 31/01/1960 1960/01/31 02JAN60 02JAN1960 December 31, 1959 Sunday,January 1, 1961
YEARCUTOFF Option: This System option specifies the first year of a 100 year span used by Informats & functions. Based on this, the century values of dates are determined by SAS system if the year is specified as a two digit year. Default century is 1920 and can be overridden using the OPTIONS statement.
How it works: When a two-digit year value is read, SAS interprets it based on a 100-year span that starts with the YEARCUTOFF= value.
Page 53 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
DATE & TIME functions: Function TODAY TIME DATETIME DAY HOUR WEEKDAY MONTH MDY DATEPART Typical Use dt = today() ; tt = time() ; datetime = datetime() ; day=day(date); hh = hour() ; wkday=weekday(date); month=month(date); date=mdy(mon,day,yr); dt = datepart(datetime) Result today's date as a SAS date value current time as a SAS time value current time as a SAS DateTime value day of month (1-31) current hour (1 - 24) day of week (1-7) of the date value month (1-12) of the date value Combines mon, day, yr into SAS data value returns the SAS date value from the SAS Datetime value
Styles of input
There are different styles of inputs available. They are List Input Column Input Formatted Input
List input
List input uses a scanning method for locating data values. Example:
DATA EMP; length name $ 13; input Empid name $ Sal; cards; 111 LawrenceJames 2000 222 Martina 3000 ;
Page 54 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
For List input style: Data values must be separated by at least one blank or other defined delimiter. Character values cannot contain embedded blanks when the file is delimited by blanks. Fields must be read in order. Data must be in standard numeric or character format and you should not use Informats for reading. Missing values can be specified only by . If the length of character data is more than 8 characters, then SAS reads only the first 8 characters. This behaviour can be overridden by using the LENGTH statement.
Column Input
Column input enables you to read standard data values that are aligned in columns in the data records. To use column input, data values must be in the same column (field lines) for all the records and in standard numeric or character form. Example:
data scores; input Empid 1-10 Name $ 11-25 cards; 111LawrenceJames 2000 222 Martina 3000 333 George 4000 ;
Features:
Sal
27-35;
Character values can contain embedded blanks and can be from 1 to 32,767 characters long. No period is required for missing data. Input values can be read in any order, regardless of their position in the record. Values do not need to be separated by blanks or other delimiters. Both leading and trailing blanks within the field are ignored. Data must be within same columns on all input lines Use the TRUNCOVER option on the INFILE statement to ensure that SAS handles data values of varying lengths appropriately.
Formatted Input
Formatted input combines the flexibility of using Informats with many of the features of column input. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data. This is the most widely used styles of Input.
Page 55 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
General Form: INPUT @n variable-name informat. ...; Where,
@n: moves the pointer to the starting position of the field. variable-name: names the SAS variable being created. Informat Informat Name: Specifies how many positions to read and how to
convert the raw data into a SAS value. Example 1:
data scores; input name $15. +6 score1 comma5. cards; James 1,000 1,220 Martina 1,100 1,210 ; Run;
Example 2:
+8
score2 comma5. ;
@1 @21 @33
1,000 1,100
1,220 1,210
Can read data in nonstandard form Character values can contain embedded blanks and can be from 1 to 32,767 characters long. No period is required for missing data. With the use of pointer controls to position the pointer, input values can be read in any order regardless of their positions in the record.
Page 56 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
General Form:
DATA TEMP; SET EMP; FILE OUT ; PUT @1 NAME $CHAR10. @15 EMPID 5. @25 SAL DOLLAR10.2 ; RUN;
The above program creates an output file and a dataset. But the goal of this SAS program is to create only a raw data file and not a SAS data set. So it is inefficient to list a data set name in the DATA statement. Using the _NULL_ Keyword: Using the keyword _NULL_ as the data set name causes SAS to execute the DATA step without writing observations to a data set. _NULL_ is a dummy dataset and it will not contain any observations in it. The same program can be re-written to create only an output file. Example:
DATA _NULL_; SET EMP; FILE OUT ; PUT @1 NAME $CHAR10. @15 EMPID 5. @25 SAL DOLLAR10.2 ; RUN;
Page 57 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Code DATA FIRESTATION; INFILE 'FIRE.TXT'; INPUT @1 CALL_NO 3. @5 DATE MMDDYY8. @14 TRUCKS 2. @17 ALARM 1. @19 AMOUNT DOLLAR13.2 ; RUN; PROC PRINT DATA = FIRESTATION; FORMAT DATE DATE9. AMOUNT COMMA13.2 ; RUN;
Refer File Name: 7.1.sas to obtain soft copy of the program code
Page 58 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
How It Works
The Informat MMDDYY8. and DOLLAR13.2 converts the date and amount value into standard form and assigns in the variables The FORMAT statement applies the DATE9. and COMMA13.2 formats to the fields DATE and AMOUNT respectively. So the output appears in the specified format. Without the FORMAT statement SAS prints the values of DATE and AMOUNT in standard numeric form.
Problem Statement
Convert the contents of the Dataset FIRESTATION (created in problem 7.1) to a csv (commaseparated-value) file. Apply the following Formats to the variables. DATE - DATE9. format AMOUNT - COMMA13.2 format
Code DATA _NULL_; FORMAT CALL_NO 3. DATE DATE9. TRUCKS 2. ALARM 1. AMOUNT COMMA13.2 ; FILE OUT.CSV DLM = ,; PUT CALL_NO DATE TRUCKS ALARM AMOUNT ; RUN;
Refer File Name: 7.2.sas to obtain soft copy of the program code
How It Works
The FORMAT statement associates the Formats with the variables DLM = , option specifies that the output file is a comma-separated-value file. _NULL_ dataset suppresses the creation of a dataset.
Page 59 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Summary
OPTIONS statement to changes SAS system options. Informat is used to read data in non-standard form Format is used to write data in non-standard form SAS stores the Data & Time values as Numeric Based on YEARCUTOFF value, the century values of dates are determined by SAS system if the year is specified as a two digit year The different styles of inputs are LIST, COLUMN & FORMATTED input. FILE statement specifies the external output file that will be created _NULL_ is a dummy dataset and it will not contain any observations in it
prodid field starts from position 1 and of data type numeric. prodname field starts from position 6 and of character data type. quantity field starts from position 18 and numeric.
Page 60 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS Procedures
Procedures are a Library of built-in programs or utilities for processing datasets and displaying results. PROC step: It begins with the keyword PROC and consists of a group of SAS statements that call and execute a procedure, with a SAS dataset as input. Procedures can use only datasets, and not other files. The procedures analyze the data and generate output as reports, charts, graphs, datasets, etc. Most of the SAS procedures work with the Data portion of the dataset.
PROC PRINT
PRINT procedure prints observations in a SAS dataset using all or some of the variables. The PRINT procedure can be controlled by the following statements and options. General Form:
Page 61 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Label - uses variable labels as column headings (variable name is the default heading) Split='split character' PROC PRINT breaks a column heading when it reaches the split character and continues the header on the next line. Statements: VAR: Select variables that appear in the report and determine their order If not used, SAS prints the values of the all the variables. BY: Produce a separate section of the report for each BY group The dataset needs to be sorted using the BY variable before using the BY statement LABEL: LABEL statement is used to assign Labels to the Variables. LABEL option needs to be used with PROC PRINT to print the Labels. SUM: Adds the total values of numeric variables specified Sample Program:
PROC PRINT DATA = EMP LABEL SPLIT = '*'; VAR EMPID NAME SALARY; BY DEPTID; SUM SALARY; LABEL EMPID DEPTID NAME SALARY ; RUN; = = = = 'Employee * ID' 'Department * ID' 'Name of * Employee' 'Salary of * Employee'
Page 62 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Sample Output:
Label
BY DEPTID
SUM
PROC CONTENTS
PROC PRINT prints the Data portion of the Dataset whereas, CONTENTS prints the Descriptor portion. PROC CONTENTS describes the structure of the data set. It displays information at the Data set level and Variable level Data set level: All the below information comes under the Dataset level Name SAS Version no Creation/Modified date Number of observations Number of variables File size (bytes)
Page 63 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Variable level: The following information is under Variable level Name Type Length Formats Position Label General Form:
Page 64 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
PROC SORT
This procedure sorts observations in a SAS dataset by one or more variables. It either modifies the existing dataset or writes into a new one. By default it sorts in ASCENDING order. General form:
Proc Run;
PROC FORMAT
This procedure is used to create user-defined Formats and Informats for character and numeric variables. General form:
Page 65 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The format name: Format names can be up to 32 characters long must begin with a dollar sign ($) if the format applies to character data cannot be the name of an existing SAS format cannot end with a number does not end with a period in the VALUE statement, but use a period while using it range specifies one or more variable values label is a text string enclosed in quotation marks. Example:
proc format; value $grade 'A'='Good' 'B'-'D'='Fair' 'F'='Poor' 'I','U'='See Instructor' Other = Miscoded ; run;
The keyword Other is similar to else statement. To create user-defined INFORMAT use the keyword INVALUE instead of VALUE. But usually we will not be using user-defined Informats for reading data values.
PROC DATASETS
The DATASETS procedure is used to manage SAS files in a SAS data library. With PROC DATASETS, you can: List the SAS files that are contained in a SAS library Copy SAS files from one SAS library to another Rename SAS files Delete SAS files Modify attributes of SAS data sets and variables within the data sets Create and delete indexes on SAS data sets The DATASETS procedure ends with a RUN statement or QUIT statement. Examples: 1. Prints the descriptor portion of all the datasets in WORK library.
Page 66 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
2. Copies all SAS files from the WORK library to the PERM library
proc datasets library= PERM; delete EMP; change DEPTA = DEPTB; run;
MODIFY Statement: This statement in the DATASETS procedure is used to change specific dataset or variable attributes. This command allows you to specify formats, informats, and labels, rename variables, and create and delete indexes. The MODIFY command only works on one dataset at a time. The following example modifies the dataset income in COMPANY library by: Renaming the variable old to new Adding a label to variable new Setting a format for variable income Example:
PROC DATASETS LIBRARY= COMPANY; MODIFY income; RENAME old=new; LABEL new=originally called old; FORMAT income comma11.2; RUN;
The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset. Without index, while searching, SAS access and checks all the values in a dataset sequentially.
Page 67 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
INDEX: The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset. Index is used to quickly search a record from a large dataset For Example, you have to search a table based on the column Name and it does not have an index. In this case SAS begins with the first row and reads through all rows in the table.
An index is a SAS file that stores unique values for a specified column in an order, and includes information about the location of those values in the table that enable you to access a row directly, by value. For example, suppose you have created an index on column Name. Using the index, SAS will access the required row(s) directly, without having to read all the other rows.
Creating an index is useful: When you use a WHERE statement to filter observations? When merging with another dataset? In performing equijoin in PROC SQL, and so on This example uses the DATASETS procedure to create a Simple Index. Example:
proc datasets library=INDRAILWAY; modify TRNTKT; index create PNRNO / UNIQUE NOMISS; run;
Page 68 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
In the example, the TRNTKT SAS data set in the INDRAILWAY SAS data library is having a Simple index created for the PNRNO variable index create The INDEX CREATE statement is used to specify that an index is to be created. In the program PNRNO is the index variable. The UNIQUE option specifies that key variable values must be unique within the SAS data set. The NOMISS option specifies that no index entries are to be built for observations with missing key variable values
Code PROC FORMAT; VALUE HTFMT 036 = '1' 3748 = '2' 4960 = '3' 61HIGH = '4'; VALUE WTFMT 0100 = '1' 101200 = '2' 201HIGH = '3'; RUN; PROC SORT DATA = HTWT; BY HEIGHT ; RUN; PROC PRINT DATA = HTWT LABEL; BY HEIGHT ;
Page 69 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS SUM WEIGHT; LABEL ID = Employee ID; FORMAT HEIGHT = HTFMT. WEIGHT = WTFMT. ; RUN; PROC CONTENTS DATA = HTWT; RUN;
Refer File Name: 9.1.sas to obtain soft copy of the program code
How It Works
BY statement groups the observations by HEIGHT. Since the data needs to be grouped by HEIGHT, the dataset is sorted by HEIGHT . FORMAT statement applies the user-defined formats to LABLEL statement applies the label to the variable ID. Since we are using the LABEL statement, we should use the LABEL option in PROC PRINT to turn on the feature. HEIGHT & WEIGHT. SUM statement adds the values of WEIGHT from all the observations.
Summary
PROC PRINT Prints observations in a SAS dataset using all or some of the variables CONTENTS prints the Descriptor portion of a dataset Sorts observations in a SAS dataset by one or more variables and either modifies the existing dataset or writes into a new one. This procedure allows to define your own formats or informats for character or numeric variables. The DATASETS procedure is used to manage SAS files in SAS data libraries. The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset.
Page 70 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
RETAIN statement:
The Retain statement retains the value of the variable in the PDV across iterations of the DATA step. It initializes the retained variable to missing before the first execution of the DATA step if an initial value is not specified General Form:
RETAIN TOTSAL 0;
Example:
DATA ALL; RETAIN TOTSAL 0; SET EMP END = EOF; TOTSAL = TOTSAL + SAL; IF EOF = 1 THEN OUTPUT ALL; RUN;
The dataset ALL has one observation and it contains the Total Salary of all the employees in the dataset EMP.
Page 71 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
variable + expression;
Example: TOTSAL + SAL; In the above example, SAS Creates a variable named TOTSAL, if it is a new variable and initializes to zero Automatically retains the value of TOTSAL Adds the value of SAL to TOTSAL and ignores missing values
Automatic Variables
Finding the First and Last Observations in a Group: When you use the BY statement along with the SET statement, DATA step creates two temporary variables for each BY variable in the form FIRST.variable LAST.variable Their values are either 1 or 0. FIRST.variable and LAST.variable identify the first and last observation in each BY group.
Before using BY statement the Input dataset should be sorted using the BY variable. The BY statement in the DATA step enables you to process your data in groups. The Data Step and the values of the automatic variables are given below.
Page 72 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
Example: To find DEPT wise total salary of all the employees from the below data
The problem can be divided into three steps. 1. Set the accumulating variable to 0 at the start of each BY group. 2. Increment the accumulating variable with a sum statement (automatically retains). 3. Output only the last observation of each BY group.
Page 73 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
PROC PRINT DATA = EMP; TITLE2 Start of PROC PRINT Report ; TITLE4 Contents of the Dataset EMP; Footnote3 End of PROC PRINT Report; RUN;
Canceling Titles and Footnotes: TITLE and FOOTNOTE statements are global statements. That is, after you define a title or footnote, it remains in effect until you modify it, cancel it, or till the end of SAS session. The following statements clear the nth and its following Title/footnote statements.
TITLE<n> ; FOOTNOTE<n>;
To cancel all the titles or footnotes, specify a null TITLE1 or FOOTNOTE1 statement like,
TITLE1; FOOTNOTE1;
Page 74 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
IF <condition1> THEN <Condn1 True Block Statement>; ELSE IF <condition2> THEN < Condn2 True Block Statement>; ELSE <False Block Statement>;
If there is more than one statement in a particular block, then group them in a DO - END loop.
IF <condition> THEN DO; <True Block Statement>; END; ELSE DO; <False Block Statement>; END; SELECT-CASE:
You can also use SELECT groups in DATA steps to perform conditional processing. This is similar to SWITCH-CASE statement in C Language General form, SELECT group:
SELECT <(expression)>; WHEN-1 <(expression)> statement; WHEN-n <(expression)> statement; <OTHERWISE statement;> END;
Page 75 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Where,
Example: The following code assigns a value to variable Title based on the value of designation.
Select (designation); when ("PAT") Title="Programmer Analyst Trainee"; when ("PA") Title ="Programmer Analyst"; when ("A") Title ="Associate"; when ("SA) Title ="Senior Associate"; otherwise Title ="Manager"; end;
Subsetting IF statement: The subsetting IF statement causes the DATA step to continue processing only those raw data records or observations that meet the condition of the expression specified in the IF statement. General form:
IF condition;
if condition is true, continue to execute data step if condition is false, stop processing current observation and return to top of data step. In particular, if condition is false do not output the current observation being formed in the PDV Example:
Data PASS; input ID M1 M2 M3; TOT = M1 + M2 + M3; if TOT > 150; /*output obs only if TOT > 150*/ cards; 50 60 80 40 60 30 70 80 90 ; Run;
Only two observations will be written to the dataset
Page 76 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
DO Loop Processing
Statements within a DO loop executes for a specific number of iterations or until a specific condition stops the loop.
Iterative DO:
TYPE 1: DO index-variable=start TO stop <BY increment>; where, start specifies the initial value of the index variable. stop - specifies the ending value of the index variable. Increment optionally specifies a positive or negative number to control the incrementing of index-variable. If no increment is specified, the index variable is incremented by 1. This iterative DO statement executes statements between DO and END statements repetitively based on the value of an index variable. Example 1:
do k = Begindate to Today() by 7;
TYPE 2: DO index-variable=item-1, <item-n>; Item-1 through item-n can be either all numeric or all character constants or they can be variables. The DO loop is executed once for each value in the list.
Page 77 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example for Type 2: 1:
do Fib = 1,2,3,4;
3:
DO WHILE
The DO WHILE statement executes statements in a DO loop while a condition is true. General form:
DO UNTIL
The DO UNTIL statement executes statements in a DO loop until a condition is true. General form:
Page 78 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Iterative DO + Condition DO : The DO WHILE and the DO UNTIL statements can be combined with the iterative DO statement. General form:
DO index-variable=start TO stop <BY increment> WHILE | UNTIL (expression); <additional SAS statements> END;
This is one method of avoiding an infinite loop in DO WHILE or DO UNTIL statements. Sample Program:
data invest; do year= 1 to 10 until(Capital > 20000); Capital+5000; Capital+(Capital*.075); output; end; run; proc print data=invest noobs; run;
Page 79 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Sample Output:
DELETE statement
The DELETE statement deletes observations from the data set being created. General Form: IF condition THEN DELETE; If condition is true, stop processing current observation and return to top of data step.In particular, if condition is true, do not output the current observation being formed in the PDV
Page 80 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
DATA EMP; INPUT ID NAME $ SAL; IF SAL >= 1000 THEN DELETE; SAL = SAL + 500; RUN;
THE SAL= statement is executed only when the SAL value is < 1000.
PUT STATEMENT
If PUT statement is used without a FILE statement, it writes the values of variables to the LOG file. General Form: PUT <variable list> <format specifier>; Use FILE PRINT; statement above the PUT statement to print the values in the OUTPUT window. Special SAS Names (Shortcuts): _NUMERIC_ - refers to all the numeric variables in a Dataset _CHARACTER_ - refers to all the character variables in a Dataset _ALL_ - refers to all the character & numeric variables in a Dataset
Page 81 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
2 2 09/16/92 09/23/92 196 202
Code PROC SORT DATA = DIET; BY ID; RUN; DATA DIET2; SET DIET; BY ID; RETAIN MEAN_WT; IF FIRST.ID THEN MEAN_WT = WEIGHT; ELSE MEAN_WT = MEAN_WT + WEIGHT; IF LAST.ID THEN DO; MEAN_WT = MEAN_WT / 4; OUTPUT; END; RUN; /**** The solution using a sum statement looks like this ****/ DATA DIET2; SET DIET; BY ID; IF FIRST.ID THEN MEAN_WT = WEIGHT; ELSE MEAN_WT + WEIGHT; IF LAST.ID THEN DO; MEAN_WT = MEAN_WT / 4; OUTPUT; END; RUN;
Refer File Name: 11.1.sas to obtain soft copy of the program code
How It Works
BY statement reads the observations in groups and created the automatic variables. Use the automatic variables and RETAIN statement to calculate the mean WEIGHT of each Subject. An alternative way to do this problem is to use a SUM statement instead of the RETAIN statement.
Page 82 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Summary
The RETAIN statement retains the value of the variable in the PDV across iterations of the DATA step SUM statement is a short-cut of the RETAIN statement. FIRST.BY-variable and LAST.BY-variable identify the first and last observation in each BY group. The text given in the TITLE & FOOTNOTE statements appears in the Top and Bottom of every page There are different types of Conditional statements available in SAS. DO loop is used to perform repetitive calculations The KEEP & DROP statements are similar to the KEEP & DROP options The DELETE statement deletes observations from the data set being created. PUT statement writes the values of variables to the LOG file
Page 83 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS ODS
SAS Output Delivery System (ODS): ODS is designed to overcome the limitations of the traditional SAS output. ODS allows output from the Data Step & SAS procedures to present in a more useful and colorful way. Using ODS we can create output in a variety of formats, such as: html, xls, pdf, rtf, etc. To start output being delivered to ODS the general syntax is:
ODS HTML FILE = C:\SASFILES\TEST.HTML; < SAS Procedures> ODS HTML CLOSE;
All output from any procedure that exists between "ods html .... ; " and "ods html close;" statements will be sent to that ODS destination. XLS: Excel File
ODS HTML FILE = C:\SASFILES\TEST.XLS; < SAS Procedures> ODS HTML CLOSE;
Page 84 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
RTF: RTF stands for Rich Text Document and is supported by MS WORD.
ODS RTF FILE = C:\SASFILES\TEST.RTF; < SAS Procedures> ODS RTF CLOSE;
PDF:
ODS PDF FILE = C:\SASFILES\TEST.PDF; < SAS Procedures> ODS PDF CLOSE; Arrays in SAS
Arrays in SAS are different from arrays in other programming languages.A SAS array is a temporary grouping of variables under a single name. It exists only for the duration of current DATA step. An array is not a variable. Each variable in an array is called an element identified by a subscript that represents the position of the element in the array. When you use an array reference, the corresponding variable is substituted for the reference. Why use SAS arrays? To repeat an action or set of actions on each of a group of variables To create many variables with same attributes write shorter programs compare variables Perform table lookup General Form: ARRAY array-name {subscript} <$><length> <array-elements> <(initial-value-list)>; The ARRAY statement defines the elements in an array. These elements will be processed as a group. You can refer to elements of the array by the array name and subscript. The ARRAY statement: Must contain all numeric or all character elements Must be used to define an array before the array name is referenced Creates variables if they do not already exist in the PDV. Example:
Page 85 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Array Name CONTRIB groups the variables Qtr1, Qtr2, Qtr3 & Qtr4. The individual variables can be accessed by using the array name & a subscript. Example: Consider you have a dataset EMP with 50 numeric variables and you have to recode the value of all the numeric variables to 99, if its value is missing. If you are not using Array then you need to repeat the following statement 50 times in the DATA step.
Data
All; Set EMP; array nvar(*) _numeric_; do i=1 to dim(nvar); if nvar(i)= . then nvar(i)= 99; end;
Run;
nvar(*) dynamically calculates the no of elements Dim( ) - is an array function which returns the no of elements in an array. _numeric_ - is a keyword that refers to all the numeric variables
Page 86 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Built-in Functions in SAS: SAS has a number of in-built functions. Broadly they can be classified as : Arithmetic Functions String Functions Date Time Functions Each of these functions is described below.
INT(argument)
Example Example X=INT(2.1) X=INT(-2.4) X=INT(3) X=INT(-1.6) X=2 X=-2 X=3 X=-1 Result
MAX
Returns the largest of non-missing values. Syntax:
MAX(argument,argument)
Example X1 = MAX(2,6,.) X2 = MAX(2,-3,1,-1) X3 = MAX(3,.,-3) X4 = MAX(OF X1-X3) Result X1=6.00000 X2=2.00000 X3=3.00000 X4=6.00000
OF keyword includes all the variables between X1 and X3 i.e., X1,X2 & X3
MIN
Returns the smallest of non-missing values. Syntax:
MIN(argument,argument)
Page 87 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example X1 = MIN(2,.,6) X2 = MIN(2,-3,1,-1) X3 = MIN(0,4) X4 = MIN( OF X1-X3) Result X1 = 2.00000 X2 = -3.00000 X3 = 0.00000 X4 = -3.00000
SUM
Returns the sum of the non-missing variables. Syntax:
SUM(argument,argument...)
Example X1 = SUM(4,9,3,8) X2 = SUM (14,9,13,8,.) X3 = SUM(OF X1-X2) Result X1 = 24.00000 X2 = 44.00000 X3 = 68.00000
MEAN
Returns the average of non-missing values. Syntax:
MEAN(argument,argument)
Example X1 = MEAN(2,.,.,6) X2 = MEAN(1,2,3,2) X3 = MEAN(OF X1-X2) Result X1 = 4.00000 X2 = 2.00000 X3 = 3.00000
MOD
Returns the remainder when the integer quotient of argument1 is divided by argument2. Syntax:
MOD(argument1,argument2)
Example X=MOD(6,3) X=MOD(10,3) X=MOD(11,3.5) X=MOD(10,-3) Result X=0.00000 X=1.00000 X=0.50000 X=1.00000
Page 88 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
ROUND(argument,<round-off unit>)
Where, round-off unit is numeric and non-negative. If round-off-unit is not provided, argument is rounded to the nearest integer Example X=ROUND(223.456) X=ROUND(223.456,1) X=ROUND(223.456,.01) X=ROUND(223.456,100) Result X=223.00000 X=223.00000 X=223.46000 X=200.00000
CEIL
The CEIL function returns the smallest integer greater than or equal to the argument. Syntax:
NewVar = CEIL(argument);
Example:
X=CEIL(4.4);
X=5
FLOOR
The FLOOR function returns the greatest integer less than or equal to the argument. Syntax:
NewVar=FLOOR(argument);
Example:
Y=FLOOR(3.6);
Y=3;
Page 89 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
NumVar = INPUT(source,informat);
Example CVar1='32000'; NVar1=input(CVar1,5.); CVar2='32,000'; NVar2=input(CVar2,comma6.); CVar3='03may2008'; NVar3=input(CVar3,date9.); Result Nvar1 = 32000 Nvar2 = 32001 Nvar3 = 17655
PUT:
Converts numeric values to character and writes values with a specific format. Syntax:
CharVar = PUT(source,format);
Example NVar1=614; CVar1=put(NVar1,3.); NVar2=55000; CVar2=put(NVar2,dollar7.); NVar3=366; CVar3=put(NVar3,date9.); Result Cvar1 = 614 Cvar2 = 55000 Cvar3 = 366
Page 90 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The values of Cvar are stored in character form. ** The enclosed quotes are used just to represent that the values are stored in character form.
LENGTH
Returns the length of an argument. Syntax:
LENGTH(argument)
Example len = LENGTH(ABCDEF); len = 6 Result
RIGHT
The RIGHT function returns its argument right aligned. Trailing blanks are moved to the start of the value. Syntax:
RIGHT(argument)
Example:
a b
= =
Variable b will hold a string due date shifted right three spaces with leading blanks instead of trailing blanks.
LEFT
Left aligns a SAS character expression. Syntax:
LEFT(argument)
Example:
a b
= =
Above statements produce a character string due date shifted left three spaces with trailing blanks instead of leading blanks.
TRIM
The TRIM function removes trailing blanks from its argument.If the argument is blank, TRIM returns one blank.
Page 91 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Syntax:
TRIM(argument)
Example part1 = apple ; part2 = sauce; noblank = TRIM(part1) || part2; hasblank = part1 || part2 ; Result part1 = apple ; part2 = sauce; noblank = applesauce hasblank = apple sauce
Leading blanks will not be removed. To remove both leading & trailing blanks use LEFT & TRIM function like,
a b
Variable b will contain a character string due date without leading and trailing spaces.
LOWCASE
Converts all letters in its argument to lowercase. It has no effect on digits and special characters. Syntax:
NewVal=LOWCASE(argument);
Example a = STRONG ; b = LOWCASE ( a ); Result a = STRONG b = strong
UPCASE
Converts all letters in its argument to uppercase. It has no effect on digits and special characters. Syntax:
NewVal=UPCASE(argument);
Example a = cognizant b = UPCASE(a); Result a = cognizant b = COGNIZANT
Page 92 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
NewVar = PROPCASE(char_var);
COMPRESS
Removes specific characters from character expressions. Syntax:
COMPRESS(source<,characters-to-remove>)
where, source: specifies a SAS character expression. characters-to-remove: specifies the character or characters you want to remove from the source expression. If the second argument is omitted, by default it is taken as blank. Example a = AB C D ; b = COMPRESS(a); p = AB CDE; q = COMPRESS(p ,'D) ; Result a = AB C D ; b = ABCD p = AB CDE q = AB CE
REPEAT
Returns a character value consisting of the first argument repeated n + 1 times. Syntax
REPEAT(argument,n)
Example a = abc; b = REPEAT(a,3); Result a = abc; b = 'abcabcabcabc';
Page 93 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
NewVar = SUBSTR(string,start<,length>);
Example date = 06MAY89; month = SUBSTR(date,3,3); Result date = 06MAY89; month = MAY
INDEX
The INDEX function searches a source string value for the location of a specified Sub-string value and returns its location. Syntax:
INDEXC
This function is similar to INDEX function, but the sub-string is considered as separate characters. Locates the first occurrence in the source of characters present in any of the excerpts. If the character string specified by any of the excerpts is not found in the source, value 0 will be returned.
Page 94 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
INDEX function searches for a character string in a source string but INDEXC function searches for individual characters Syntax:
INDEXC(source,excerpt-1<,excerpt-n>)
Example a = ABC.DEF (X=Y) ; x=INDEXC(a,0123456789,;( )=.); Result a = ABC.DEF (X=Y) ; x=4
TRANSLATE
It replaces specific character in a character expression. Syntax:
Values of to and from correspond on a character-by-character basis. TRANSLATE changes character one of from to character one of to, and so on. If to have fewer characters than from, TRANSLATE changes the extra from characters to blanks. If to has more characters than from, TRANSLATE ignores the extra to characters. Example d = TRANSLATE ( xyzw,ab ,vw) d = xyzb Result
TRANWRD
The TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string. Syntax:
NewVal=TRANWRD(source,target,replacement);
Example:
Page 95 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Result: Dessert = Apple pie
VERIFY
Returns the position of the first character in the source string that is not in the check-string Syntax:
VERIFY(source,check-string);
Example Result
x = 2
In this case, the second character p of the string apple is not present in the excerpt abcdef and so the position of p is returned to the variable x.
SCAN
The SCAN function returns the nth word of a character value. It is used to extract words from a character value if they are separated by delimiters Syntax:
<
Example:
( + | & ! $ * ) ; ^
- /
, % > \
Page 96 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The CAT Functions: The Version 9 family of CAT functions reduces complexity when concatenating strings. CAT CATT CATS CATX
What it Does Concatenate two or more character strings, leaving leading or trailing blanks unchanged. Identical to the concatenation operator [ || ]. Same as CAT but also strips both leading and trailing blanks prior to concatenation. Same as CAT but also TRIMS Concatenate two or more character strings, stripping both leading and trailing blanks, and inserting one or more user specified separation characters
separator is one or more characters, placed in single or double quotation marks, to be used as
separators between the concatenated strings. Example:
Page 97 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Usage CAT_FN = CAT(A,B) CATS_FN = CAT(A,B) CATT_FN = CAT(A,B) CATT_FN = CAT(":",A,B) Result CAT_FN = "Micky Mouse " CATS_FN = "MickyMouse" CATT_FN = "Micky Mouse" CATT_FN = "Micky:Mouse"
Code /* Solution without arrays: */ DATA NEW; SET SCORES; X1 = INPUT (SUBSTR(STRING,1,1),1.); X2 = INPUT (SUBSTR(STRING,2,1),1.); X3 = INPUT (SUBSTR(STRING,3,1),1.); X4 = INPUT (SUBSTR(STRING,4,1),1.); X5 = INPUT (SUBSTR(STRING,5,1),1.); KEEP ID X1X5; RUN; / * Solution using arrays: */ DATA NEW; SET SCORES; ARRAY X[5] X1X5; DO POINTER = 1 TO 5;
Page 98 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
How It Works
Without using ARRAYs you may need to repeat the same statement multiple times. X1-X5 refers to all the variables between X1 to X5. Since the X variables are not existing ones they are created by SAS. INPUT function is used to convert the value to Numeric.
Problem Statement 2
You have clinical data in a SAS data set called CLINICAL which contains information on patient visits. Included in the data set are patient ID, DATE, BILLING (billing number), and DX (diagnosis code). You also have a list of DX codes and their descriptions. Using the following CLINICAL data and the list of DX codes and descriptions, create a new data set, NEW, which contains all the variables in CLINICAL plus a new variable (DESCRIP) which contains the DX description. Use PROC FORMAT and a PUT function as in Example 2 to solve this problem.
Code PROC FORMAT; VALUE DXCODE 1 = 'Cold' 2 = 'Flu' 3 = 'Asthma' 4 = 'Chest Pain' 5 = 'Maternity' 6 = 'Diabetes'; RUN; DATA CLINICAL;
Page 99 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS INFILE 'CLINICAL'; INPUT ID DATE : MMDDYY8. BILLING DX; RUN; DATA NEW; SET CLINICAL; DESCRIP = PUT (DX,DXCODE.); RUN;
Refer File Name: 13.2.sas to obtain soft copy of the program code
How It Works
Create a format for the values of Dxcode. Assign the description of DX to a new variable using PUT function and the format.
Problem Statement 3
You have a raw data file called TEMPER which contains temperature measurements taken at one hour intervals. Each raw data line contains several pairs of the variables HOUR (hour of the day) and TEMP (temperature). All temperatures are in degrees Fahrenheit unless they are written in the form nC (the number n followed by a C, no spaces), in which case they are expressed in degrees Celsius. In addition, a value of N was coded when a temperature was not obtained. Write a SAS program to read this data file, express all temperatures in degrees Fahrenheit, and convert each N to a numeric missing value. Hint: The conversion from Celsius to Fahrenheit is: F=9*C/5+32 Some sample records from file TEMPER are as follows: 1 68 2 67 3 N 4 20C 5 72 6 23C 7 75 8 N
Code DATA TEMP; INFILE 'TEMPER'; INPUT HOUR DUMMY $ @@; IF DUMMY = 'N' THEN TEMP_F = .; ELSE IF INDEX(DUMMY,'C') NE 0 THEN TEMP_F = 9*INPUT (SUBSTR(DUMMY,1,LENGTH(DUMMY)1),5.)/5 + 32; ELSE TEMP_F = INPUT (DUMMY,5.); DROP DUMMY; RUN;
Page 100 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Refer File Name: 13.3.sas to obtain soft copy of the program code
How It Works
Since more than one observation is in a single line we are using @@. Use INDEX function to find whether C appears in the value of DUMMY. If so extract the numeric part alone and convert it to Fahrenheit by using the given formula. Else convert the value of DUMMY to numeric.
Problem Statement 4
You have an instream raw data file of patient hospital stays with the following file layout: Starting Column Length Format Description _______________________________________________ 1 3 character Subject ID 4 6 mmddyy Admission date 10 6 mmddyy Discharge date 16 8 mmddyyyy Date of birth Here are some sample data: 00101059201079210211946 00211129211159209011955 00305129206099212251899 00401019301079304051952 a) Write a program to create a SAS data set called DATES1, and list the resulting data set with PROC PRINT. Create variables ID, ADMIT, DISCH, and DOB from the given data, and also create the following new variables: i. AGE: Age in years on the date of admission (as of the last birthday) ii. DAY: Numeric day of the week of admission date (1=Sun, 2=Mon, etc.) iii. MONTH: Numeric month of year of admission date (1=Jan, 2=Feb, etc.) iv. NoWeek: Number of weeks patient stayed in the hospital b) Set up the DATA step so that the variables print with the following formats: i. ADMIT mm/dd/yy ii. DISCH mm/dd/yy iii. DOB ddMMMyyyy
Code DATA DATES1; INPUT @1 ID $3. @4 ADMIT MMDDYY6. @10 DISCH MMDDYY6. @16 DOB MMDDYY8.; AGE = INT((ADMITDOB)/365.25);
Page 101 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS DAY = WEEKDAY (ADMIT); MONTH = MONTH (ADMIT); NOWEEK = INTCK(WEEK, ADMIT, DISCH); FORMAT ADMIT DISCH MMDDYY8. DOB DATE9. ; DATALINES; 00101059201079210211946 00211129211159209011955 00305129206099212251899 00401019301079304051952 ; RUN; PROC PRINT DATA=DATES1; RUN;
Refer File Name: 13.4.sas to obtain soft copy of the program code
How It Works
Since date is stored in no of days in SAS, just by subtracting DOB from Admit date and dividing it by 365.25, we get the persons AGE. (.25 = to include the leap year) INTCK function returns the number of intervals (WEEK in this case) between ADMIT date and DISCH date.
Summary
ODS allows output from the Data Step & SAS procedures to present in a more useful and colorful way. A SAS array is a temporary grouping of variables under a single name. SAS has a number of in-built functions. Broadly they can be classified as Arithmetic Functions, String Functions and Date Time Functions
Page 102 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
8. What do the following do? INPUT PUT CATX SCAN SUBSTR TRIM MOD 9. Create a program for the following requirement Following is the data in a file: vinodM24 yahooF22 altavistaF18 googleF20 Read the data into a single variable and use functions to retrieve them into three variables Name Gender Age
Page 103 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Session 16: Built-in Functions in SAS / Merging and Combining SAS Data Sets
Learning Objectives
After completing this session, you will be able to: Work with date time functions Describe concatenation Perform One-to-One reading Perform One-to-One merging Perform Match-Merging Perform JOINS in DATA step
DATE or TODAY
Returns the current date as a SAS date value representing the number of days between January 1, 1960 and the current date Syntax:
DATE( ) TODAY()
Result tday1 & tday2 will hold a value which is equal to the number of days between January 1 , 1960 and the date on which the statement is executed.
TIME
Returns the current time of the day as a SAS time value. Syntax:
TIME( )
Page 104 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example TT = TIME( )
Result SAS system will assign the variable TT a SAS time value corresponding to 14:32:00 if the following statements is executed exactly at 2:32 p.m.
DATETIME
Returns the current date and time of a day as a SAS datetime value representing the number of seconds between January 1 , 1960 midnight and the current datetime. Syntax:
DATETIME( )
Example dttime = DATETIME( ); Result Variable dttime will hold a SAS value representing the number of seconds between January 1, 1960 midnight and the current datetime.
Extracting the parts of a SAS Date, Time or Datetime Variable: Function DAY MONTH YEAR Usage DAY(<date | datetime>) MONTH(<date | datetime>) YEAR(<date | datetime>) Decription Returns the day of the month from a SAS date or datetime value. Returns the MONTH value from a SAS date or datetime value. Returns the YEAR value from a SAS date or datetime value. Returns the QTR of the year from a SAS date or datetime value. JAN-MAR = 1Q; APR-JUN = 2Q JUL-SEP = 3Q; OCT-DEC = 4Q Returns the HOUR value from a SAS time or datetime value. Hour value ranges from 0 to 23 Returns the MINUTE value from a SAS time or datetime value. Returns the SECOND value from a SAS time or datetime value.
QTR
QTR(<date | datetime>)
HOUR
HOUR(<time | datetime>)
MINUTE SECOND
WEEKDAY
Returns a numeric value for the day of the week. Syntax: Wkdy = WEEKDAY(<date | datetime>) Returns the day of the week in numeric from a SAS date or datetime variable.
Page 105 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
1 - SUN 5 - THU
2 - MON 6 - FRI
3 - TUE 7 - SAT
4 - WED
MDY
Returns a SAS date value from month, day and year values. Syntax:
MDY(month,day,year)
There are separate variables for month, day and year. MDY function creates a SAS date variable using these values. Where, month: Specifies a numeric expression representing an integer from 1 through 12. day: Specifies a numeric expression representing an integer from 1 through 31. year: Specifies a numeric expression representing a specific year. Example m = 8 ; d = 27 ; y = 90 ; date1 = MDY(m,d,y); Result date1 will hold a value of 11196 which is the number of days between January 1, 1960 and August 27, 1990.
DATEPART / TIMEPART
A SAS System Datetime Variable contains information on both the date and time i.e., the number of seconds since January 1, 1960. To extract the DATE or TIME parts of a SAS datetime variable use, DATEPART function TIMEPART function Syntax:
DATEPART(datetime) TIMEPART(datetime)
Example: Thursday, Oct. 21, 2004 at 1300 hrs is represented in SAS DateTime value as 1413379800
Page 106 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Calculating Time Intervals: There are two ways to calculate the time interval between two dates: 1. Arithmetic operation on SAS date, time or datetime variables, or between a variable and a constant YEARS = (date2-date1)/365.25; MONTHS = (date2-date1)/30.4; 2. Use of the INTCK function
INTCK
Determines the number of interval boundaries which have been crossed between two SAS date, time or date time variables Syntax:
From SAS date, time or datetime variable identifying the START of the time interval. To SAS date, time or datetime variable identifying the END of the time interval. INTCK function calculates only the number of interval boundaries crossed between two dates. Example qtr = INTCK (QTR,10OCT88D,01MAR89d); date = INTCK(YEAR,31DEC89D,1JAN90D); year = INTCK(YEAR,1JAN89D,31DEC89D); td = '1dec2008'd; month = INTCK('MONTH','10jan2008'd, td); Result qtr = 1 Description Returns the no of QTR boundaries between two dates, i.e., no of JAN 1, APR 1, JUL 1, OCT 1 No of Year boundaries, i.e., No of JAN 1 No of Year boundaries, i.e., No of JAN 1 No of month boundaries, i.e., first day of a month
date = 1 year = 0
month = 11
Page 107 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
INTNX(interval,from,no);
Where, interval time interval From start date no integer representing the no of time intervals. The result will be as the first date of the time interval. For example, if interval is MONTH then it returns day one of the respective month as SAS date. Example BDATE = 05mar2008d; DT = INTNX(month,BDATE,3); Result the result is a SAS date variable representing the first day of the month which is three months past the BDATE value, i.e., 01JUN2008 as SAS date.
Merging and Combining SAS Data Sets We can create a Dataset from two or more existing data sets by Combining Data Vertically (appends the observations from one data set to another data set) Combining Data Horizontally (joining observations side-by-side) Methods to combine SAS data sets Combining Vertically concatenating interleaving Combining Horizontally one-to-one reading one-to-one merging match merging Updating
Combining Vertically
Appends the observations from one or more data set row-wise to create a resultant dataset.
Page 108 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Concatenating
Concatenating Two Data Sets Concatenating the data sets appends the observations from one data set to another data set. The DATA step reads DATA1 sequentially until all observations have been processed, and then reads DATA2 Data set COMBINED contains the results of the concatenation. Note that the data sets are processed in the order in which they are listed in the SET statement
Interleaving
Interleaving combines observations from two or more data sets, based on one or more common variables. The resultant dataset COMBINED will be in sorted order. Since we are using a BY statement with SET statement, the Input datasets DATA1 & DATA2 should be sorted by the variable YEAR.
Page 109 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Combining Horizontally
Combining data horizontally refers to the process of merging or joining multiple data sets into one data set
One-to-one reading
In a one-to-one match, key values in both the base table and the lookup table are unique. Therefore, for each observation in the base table, no more than one observation in the lookup table has a matching key value. One-to-one reading combines observations from two or more SAS data sets by creating observations that contain all of the variables from each contributing data set. Observations are combined based on their relative position in each data set, that is, the first observation in one data set with the first in the other, and so on.
Page 110 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The DATA step stops after it has read the last observation from the smallest data set.
One-to-one merging
Similar to one-to-one reading, with two exceptions you use the MERGE statement instead of multiple SET statements, the DATA step reads all observations from all data sets
Match merging
In a one-to-many match, key values in the base table are unique, but key values in the lookup table are not unique Match-merging combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables. Input datasets DATA1 & DATA2 should be sorted by YEAR before Merging.
Page 111 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Updating
Updating uses information from observations in a transaction data set to delete, add, or alter information in observations in a master data set. You can update a master data set by using the UPDATE statement or the MODIFY statement. If you use the UPDATE statement, your input data sets must be sorted by the values of the variables listed in the BY statement. If you use the MODIFY statement, your input data does not need to be sorted. By default, UPDATE and MODIFY do not replace non-missing values in a master data set with missing values from a transaction data set
Page 112 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS-data-set (IN=variable)
Where, variable is any valid SAS variable name. It is a temporary numeric variable with a value of: 1 if the data set contributes to the observation 0 if the data set does not contribute to the observation The variable will not be written to the dataset. Example:
Page 113 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Performing JOINS in DATA Step: Using the automatic variables we can perform different join operations Equi-Join Left Outer Join Right Outer Join Full Outer Join Example:
data three; merge one(in=x) two(in=y); by id; <sas join statement> ; run;
Join Operation Equi-Join Left Outer Join Right Outer Join Full Outer Join
Page 114 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Note: 1. Data are not in ID or 2. There are some IDs that are in one file only.
Code PROC SORT DATA=DEMOG; BY ID; RUN; PROC SORT DATA=SCORES; BY SSN; RUN; DATA BOTH; MERGE DEMOG (IN=IN_DEMOG) SCORES (IN=IN_SCR RENAME=(SSN=ID)); BY ID; IF IN_DEMOG AND IN_SCR; RUN; PROC PRINT DATA = BOTH; RUN;
Refer File Name: 16.1.sas to obtain soft copy of the program code
How It Works
Sort the two datasets separately. Since the variable name should be the same in both the datasets for performing the merge, Rename the variable SSN in dataset SCORES to ID. For INNER join use the IN= dataset option to find the contributing dataset.
Page 115 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Problem Statement 2
Problem Statement: 16.2 You have a MASTER file which contains PART (part number), NUMBER (number in stock), PRICE, and SIZE. The file is sorted by PART. You want to update this file as follows: For PART 222, you now have 15 in stock. For PART 123, you have a new price of $1,500. For PART 333, you have a new price of $2,000 and 20 in stock. Data set MASTER PART NUMBER 111 34 123 87 124 45 222 19 234 20 333 30
SIZE A B A C A B
Code DATA NEWDATA; INPUT PART NUMBER PRICE; DATALINES; 222 15 . 123 . 1500 333 20 2000 RUN; PROC SORT DATA=NEWDATA; BY PART; RUN; DATA MASTER; UPDATE MASTER NEWDATA; BY PART; RUN;
Refer File Name: 16.2.sas to obtain soft copy of the program code
How It Works
Verify the contents of the dataset MASTER after updating. You will find that the missing values in the dataset NEWDATA are not updated to the MASTER dataset.
Page 116 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Summary
SAS provides a bundle of Date & Time functions for extracting the required information from a SAS date or time variable We can create a Dataset from two or more existing data sets by Combining Data Vertically or Combining Data Horizontally We can use the IN= data set option to detect which data set contributed to an observation.
Page 117 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
PROC FREQ
The FREQ procedure is a descriptive procedure as well as a statistical procedure that produces one way and n-way frequency tables. It concisely describes your data by reporting the distribution of variable values. PROC FREQ displays frequency counts of the data values in a SAS data set. It can produce statistics to analyze relationships among variables. By default, PROC FREQ Analyzes every variable in the SAS dataset Displays each distinct data value Calculates the number of observations in which each data value appears and the corresponding percentage Indicates for each variable how many observations have missing values. Creates report on every variable of the data set. Produces percent, cumulative frequency & cumulative percent. Syntax:
Page 118 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Norow - suppresses printing of row percentages of a crosstab. o Nopercent - suppresses printing of cell percentages of a crosstab.
PROC FREQ DATA = EMP; TABLES DEPTID; TITLE3 'One way Freq of DEPTID'; RUN;
Creating Two-Way Tables To produce cross-tabulation report on one or more variables, use asterisk (*) between the variables.
PROC FREQ DATA = EMP; TABLES DEPTID * GENDER / LIST; TITLE3 'Two-way Freq of DEPTID Vs GENDER'; RUN;
Page 119 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Multi-Threaded Processing
Multi-threaded processing is a type of parallel processing introduced in SAS System 9. Parallel processing means, multiple units of work are available to be scheduled for concurrent execution by the operating system. This technology takes advantage of hardware that has multiple CPUs, called symmetric multiprocessing machines (SMPs).
Page 120 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Processes suitable for threading are: sorting grouping summarizing The multi-threading capability of SAS improves processing time of the following procedures: SORT SQL MEANS SUMMARY REPORT Threaded processing can be controlled via the SAS system option THREADS | NOTHREADS. General Form: OPTIONS THREADS | NOTHREADS; THREADS enables Multi-threaded processing NOTHREADS disables Multi-threaded processing. This is the default option. The THREADS | NOTHREADS option can also be specified in the PROC statement, which enables or disables multi-threaded processing of the input dataset. When the option is specified in the PROC statement, it overrides the SAS system option THREADS | NOTHREADS. Example: To enable Multi-threading
PROC MEANS <DATA=SAS-data-set> CLASS <variable list>; VAR <variable list>; OUTPUT OUT=SAS-data-set <statistic-keyword=variablename(s)>; RUN;
Page 121 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
Page 122 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Keyword N NMISS STDDEV / STD VAR RANGE Description Number of observations with non-missing values Number of observations with missing values Standard deviation Variance Range
Sample Program 2:
PROC MEANS DATA = EMP SUM; CLASS DEPTID; VAR SALARY; RUN;
Page 123 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
PROC SUMMARY
You can create a summarized output data set by using the SUMMARY procedure. PROC SUMMARY is similar to PROC MEANS in syntax and you can do all the analysis that can be done by PROC MEANS. The difference between the two procedures is that PROC MEANS produces a report by default, but PROC SUMMARY does not. By default, PROC SUMMARY creates only an output dataset.
PROC REPORT
PROC REPORT is another powerful display procedure that combines display and statistical analysis capabilities in one procedure.It produces a variety of reports using a single report-writing tool. It combines the features of PROC PRINT PROC MEANS PROC SUMMARY PROC SORT PROC TABULATE Why PROC REPORTS? proc report requires less code and time is easy to learn and use easier to apply ODS style elements in proc report
Page 124 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Features of REPORT Procedure create listing reports create summary reports enhance reports request separate subtotals and grand totals General Form:
PROC REPORT DATA=SAS-data-set <options>; COLUMN column-specifications; DEFINE variable/ <usage> <attribute-list>; RUN;
Options: WINDOWS | WD - invokes the procedure in an interactive REPORT window. This is the default option. NOWINDOWS | NOWD displays the report in the OUTPUT window. COLUMN column-specifications; select and order the variables that appear in your list report This is similar to VAR statement in PROC PRINT It omitted, by default it takes all the variables. DEFINE variable / <usage> <attribute-list>; The DEFINE statement is used to Define how each variable is used in the report Assign formats and labels to variables Change the order of the values in the report Usage: DISPLAY: Displays values in column without ordering or grouping (just like proc print). ORDER: Sorts the report in ascending order, DESCENDING option also available GROUP: Groups observations into summarization lines. ANALYSIS: Returns the requested statistic. Attribute-list: FORMAT = <format name> - assigns a format to a variable report-column-header - defines the column header (Label) for the column GROUP <variables> - produce summary reports DISPLAY & ORDER <variables> - produce listing reports
Page 125 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
DATA = EMP ; EMPID NAME GENDER SALARY; EMPID / ORDER 'Employee ID'; NAME / DISPLAY 'Name of Employee'; GENDER / DISPLAY ; SALARY / DISPLAY 'Salary of Employee';
EMP ; GENDER SALARY; / GROUP 'Dept ID'; / GROUP 'Gender'; / SUM 'Salary of Employee';
Page 126 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Page 127 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
19 20 2 0 3 0
Code PROC FREQ DATA = GRADES; TABLES EXAMINERA / MISSING; TITLE3 ST FREQ OF EXAMINERA; RUN; PROC FREQ DATA = GRADES; TABLES EXAMINERA * EXAMINERB / LIST MISSING; TITLE3 2-WAY LIST FREQ OF EXAMINERA * EXAMINERB; RUN; PROC FREQ DATA = GRADES; TABLES EXAMINERA * EXAMINERB / MISSING; TITLE3 2-WAY BOX FREQ OF EXAMINERA * EXAMINERB; RUN;
Refer File Name: 18.1.sas to obtain soft copy of the program code
How It Works
By default, PROC FREQ produces n-way frequency as BOX frequency. By including the option LIST, it generates a List frequency. MISSING option includes the missing values also in the report.
Problem Statement 2
Use the dataset BOTH created in Problem 1 and compute the mean IQ and GPA for each value of GENDER. Do this for all the data and then for employees born before January 1, 1972.
Code PROC MEANS N MEAN DATA=BOTH; CLASS GENDER; VAR IQ GPA; RUN; PROC MEANS N MEAN DATA=BOTH; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; CLASS GENDER; VAR IQ GPA; RUN;
Refer File Name: 18.2.sas to obtain soft copy of the program code
Page 128 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Problem Statement 3
Generate the report mentioned in problem 18.2 using REPORT procedure.
Code PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; RUN; PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; RUN;
Refer File Name: 18.3.sas to obtain soft copy of the program code
How It Works
Here the grouping variable is GENDER and the analysis variable is IQ and GPA. MEAN options prints the Average value of each group. '01JAN72'D is the date constant. WHERE statement prints only the observations whose 01JAN72 and is not missing. N prints the number of observations DOB value is less than
Problem Statement 4
Export the output of problem 18.3 to a RTF file.
Page 129 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS COLUMN DEFINE DEFINE DEFINE RUN; GENDER IQ GPA N; GENDER / GROUP; IQ /ANALYSIS MEAN; GPA / ANALYSIS MEAN;
PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; RUN; ODS RTF CLOSE;
Refer File Name: 18.4.sas to obtain soft copy of the program code
Summary
The FREQ procedure is a descriptive procedure as well as a statistical procedure that produces one way and n-way frequency tables. To produce cross-tabulation report on one or more variables, use asterisk (*) between the variables in the TABLES statement Multi-threaded processing is a type of parallel processing introduced in SAS System 9 Threaded processing can be controlled via the SAS system option THREADS | NOTHREADS. The MEANS procedure displays simple descriptive statistics PROC SUMMARY is similar to PROC MEANS PROC REPORT is another very powerful display procedure that combines display and statistical analysis capabilities in one procedure. It produces a variety of reports using a single report-writing tool
Page 130 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Page 131 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
For example, the following PROC SQL step contains two statements: the PROC SQL and the SELECT statement. The SELECT statement contains several clauses: SELECT, FROM, and WHERE.
proc sql; select empid, jobcode, salary, salary*.06 as bonus from sasuser.payrollmaster where salary<32000 ;
The PROC SQL step does not require a RUN statement. It executes each query automatically. It ends with a QUIT statement. The variables, datasets in the queries are separated by comma and not by spaces like other SAS statements.
PROC SQL options; SELECT column(s) FROM table-name | view-name WHERE expression GROUP BY column(s) HAVING expression ORDER BY column(s); QUIT;
A SIMPLE PROC SQL: Example:
Page 132 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
PROC SQL; SELECT SUBSTR(STORE,1,3) AS STORENO, SALES, (SALES * .05) AS TAX, (SALES * .05) * .01 FROM USSALES; QUIT;
There can be any number of SQL statements in a PROC SQL procedure.
Page 133 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
proc sql; select membertype, sum(milestraveled) as TotalMiles from sasuser.frequentflyers group by membertype; Quit;
Here, the SUM function adds the values of the MilesTraveled column to create the TotalMiles column. The GROUP BY clause groups the data by the values of MemberType. The results show total miles by membership class (MemberType). You can use most of the SAS functions in the SQL statements:
Page 134 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
PROC SQL; SELECT U.STORENO, U.STATE, F.SALES AS FEBSALES FROM USSALES U, FEBSALES F WHERE U.STORENO=F.STORENO; QUIT; Limiting no of rows to be read and displayed OUTOBS= option
To indicate the maximum number of rows to be displayed, you can use the OUTOBS= option in the PROC SQL statement. OUTOBS= is similar to the OBS= data set option. General Form:
INOS= option
The INOBS= option restricts the number of rows that PROC SQL takes as input from any single source. General Form:
WARNING: Only 5 records were read from WORK.ALL due to INOBS= option. Using Operators in PROC SQL
Comparison, logical, and concatenation operators are used in PROC SQL in the WHERE clause as they are used in other SAS procedures:
Page 135 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
For example, the following WHERE clause contains the logical operator AND, which joins multiple conditions and two comparison operators: an equal sign (=) and a greater than symbol (>). Example:
proc sql; select ffid, name, state, pointsused from sasuser.frequentflyers where membertype = 'GOLD' AND pointsused > 0 order by pointsused;
You can also use the following conditional operators. All of these operators can also be used in other SAS procedures.
Calculated Values
The following PROC SQL query creates the new column Total by adding the values of three existing columns: Boarded, Transferred, and Nonrevenue Example:
select flightnumber, date, destination, boarded + transferred + nonrevenue as Total from sasuser.marchflights where total < 100;
Page 136 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
If you use the newly created field Total in the where clause, SAS throws an error message. Log file:
from sasuser.marchflights where total < 100; ERROR: The following columns were not found in the contributing tables: total
This error message is generated because, in SQL queries, the WHERE clause is processed prior to the SELECT clause Using the Keyword CALCULATED: When you use a column alias in the WHERE clause to refer to a calculated value, you must use the keyword CALCULATED along with the alias. The CALCULATED keyword informs PROC SQL that the value is calculated within the query. Example:
select flightnumber, date, destination, boarded + transferred + nonrevenue as Total from sasuser.marchflights where calculated total < 100;
This query executes successfully and produces the following output.
Page 137 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
To control the formatting of columns in output, you can specify SAS data set options, such as LABEL= and FORMAT=, after any column name specified in the SELECT clause
Note: The data set options LABEL= and FORMAT= are not part of the ANSI standard. These options are SAS enhancements. Example:
proc sql outobs=5; title 'Current Bonus Information'; title2 'Employees with Salaries > $75,000'; select empid label='Employee ID', jobcode label='Job Code', salary, salary * .10 as Bonus format=dollar12.2 from sasuser.payrollmaster where salary>75000 order by salary desc
The first two columns have new labels, the Bonus values are consistently formatted, and two title lines are displayed at the top of the output.
Page 138 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
CONCLUSION
PROC SQL is a powerful tool. It can make your life much easier. For beginner SQL users, remember the following points: Be careful about many to many table joins in SQL. When joining tables that have multiple records per matching ids, the output table may be a Cartesian product. For example, 3 rows joining 5 rows of same id variable will produce 15 rows, as compared to the DATA Step MERGE where only 5 rows will be created. PROC SQL is code-saving, but not always time-saving.
EmpID 1970 1422 1658 1113 1094 1789 1422 1564 1354 1094 1101
JobCode FA1 FA1 SCP FA1 FA1 SCP FA1 SCP SCP FA1 SCP
Salary $31,661 $31,436 $25,120 $31,314 $31,175 $25,656 $31,436 $26,366 $25,669 $31,175 $26,212
Code proc sql; select empid, jobcode, salary, salary*.06 as bonus from emp where salary<32000 order by jobcode; Quit;
Refer File Name: 20.1.sas to obtain soft copy of the program code
Page 139 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Problem Statement 2
With the dataset EMP, Determine the total salary for each jobcode. Apply a format to the new column. Limit the output observations to 10
Code proc sql outobs = 10; select jobcode, sum(salary) as Totsal format = dollar13.2, from emp group by jobcode; Quit;
Refer File Name: 20.2.sas to obtain soft copy of the program code
How It Works
Group by clause Summarizes the observations by jobcode SUM function finds the sum of Salary for each group. Outobs is similar to OBS option and it prints only the specified number of observations
Summary
PROC SQL is a powerful SAS Procedure that combines the functionality of DATA and PROC steps into a single step PROC SQL is SAS' implementation of Structured Query Language (SQL), which is similar to ANSI SQL. Is composed of clauses To group data for summarizing, you can use the GROUP BY clause. A join is used to combine information from multiple files. To indicate the maximum number of rows to be displayed, you can use the OUTOBS= option in the PROC SQL statement. Comparison, logical, and concatenation operators are used in PROC SQL in the WHERE clause as they are used in other SAS procedures When you use a column alias in the WHERE clause to refer to a calculated value, you must use the keyword CALCULATED along with the alias. You can improve the appearance of your query output by using column labels and formats titles and footnotes columns that contain a character constant
Page 140 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Page 141 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS Macro
The macro facility is one of the most powerful features of BASE SAS. SAS macros enable you to substitute text in your SAS programs. When you reference a macro, SAS replaces the reference with the text value that has been assigned to that macro. This makes your programs more reusable and dynamic. In simple terms, the SAS macro facility is a tool for text substitution. Macros allow users to: Write more flexible code Pass information between data or proc steps Generate SAS statements based on the data. There are two main components of the SAS macro facility: Macro variables Macro programs Macro variables are like parameters passed on to a SAS program. Macro programs use macro variables and macro programming statements to build SAS programs.
Page 142 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Using the SAS Macro facility, SAS programs can become reusable, shorter, and easier to follow. accomplish repetitive tasks quickly and efficiently Without changing the code, we can customize the results by passing parameters to the macro program conditionally execute SAS code perform repetitive tasks Debugging is easier Automatically insert the date and other session information into your code Write more flexible code, and pass data between DATA/PROC steps during execution time
Macro variables
Macro variables belong to the SAS macro language and are different from Data step variables. You can define and use macro variables anywhere in a SAS program, except in DATALINES or CARDS. The %LET statement enables you to define a macro variable and to assign a value to it. General form:
Page 143 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
%LET Statement %let start=; %let sum=4+3; %let total=0+∑ %let x=varlist; %let &x=name age height; Variable Name start sum total x varlist 4+3 0+4+3 varlist name age height Variable Value Length 0 3 5 7 15
DATA Step Variable Vs Macro Variable: The following table illustrates the difference between DATA Step Variable and Macro Variable DATA Step Variable DATA step variable belongs to the SAS language Its value depends on the observation being processed. Is part to the SAS Dataset Macro Variable Macro variables belong to the SAS macro language Contains one value that remains constant until explicitly changed. Is independent of the SAS Data set
Where we can use Macro Variable: In your SAS programs, you might find that you need to reference the same text string multiple times. Example:
DATA sales; Set DEPT; where Dept = sales; run; proc print data = sales; title List of employees in sales department; run;
Then, you might need to change the references in your program in order to reference a different text string. If your programs are lengthy, updating them manually can take a lot of time, also chances of manual errors are more. If you use a macro variable in your program, you only need to make the change in one place and SAS will echo its value in all the places where it is referenced. Example:
%let dept = sales; DATA &dept; Set ALL_DEPT; where Dept = &dept; run; proc print data = &dept; title List of employees in &dept department; run;
Page 144 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
User-defined macro variables: The macro variables created by the user are user-defined macro variables. Example: you can create a user-defined macro variable with %LET statement and CALL SYMPUT routine.
When you submit a program, it goes to an area of memory called the input stack.
Page 145 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Word scanner reads the program from Input Stack and divides program text into fundamental units called tokens o Tokens are passed on demand to the compiler. o The compiler requests tokens until it receives a semicolon. o SAS stops sending statements to the compiler when it reaches a step boundary. e.g. RUN statement Compiler checks the syntax of tokens received from the word scanner. After it completes checking the syntax, the code is sent for execution. Executor executes the code and prints the result to the Log and the Output files. Terms used in Macro Processing: Term input stack word scanner Description Holds a SAS program after it is submitted. Scans the text it takes from the input stack and breaks the text into tokens. Determines the destination of the token: DATA step compiler, macro processor, etc. Fundamental unit in the SAS language. Tokens are the actual keywords in the SAS statements as well as the literal strings, numbers, and symbols. Ex: DATA, 1234, +, - , =, variable Checks the syntax of tokens received from the word scanner. After it completes checking the syntax, the code is sent for execution. Processes macro language references and statements. The symbols & and %, when followed by a letter or underscore, that signal the word scanner to transfer the current statement to the macro processor.
token
Macro Facility: The macro facility includes a macro processor that is responsible for handling all macro language elements. When a macro trigger is detected, the word scanner passes it to the macro processor for evaluation. The Compiler does not recognize the macro statements. Macro Trigger: The word scanner recognizes the following token sequences as macro triggers: % followed immediately by a name token (such as %let) & followed immediately by a name token (such as &dept).
Page 146 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
SAS Program flow of execution with Macro statements:
SAS Program flow of execution with Macro statements: When the Word Scanner encounters a macro trigger it sends the statement to the Macro Processor. Macro Processor processes macro language references and macro statements and returns the SAS codes (without macro statements) The resolved SAS codes are returned to the Input Stack and the execution continues. Combining Macro variable reference with text (Concatenation): When you place a macro variable reference adjacent to text, then SAS interprets the entire text as a macro variable. Example:
Page 147 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS %let txt=A is greater than B; end; else do; %let txt= A is lesser than B; end;
Any guesses what will be the value of the macro variable txt. A is greater than B? No, it is not, the value is A is lesser than B. This is because the macro facility performs its task before SAS program executes, but SAS assigns the values of A and B only during the execution time. So the condition will not be evaluated and both the %let statements are sent to the macro processor. The macro processor first executes %let txt=A is greater than B; Then the next statement is executed %let txt= A is lesser than B; The latest value A is lesser than B is assigned to the macro variable txt. So we cannot use or assign SAS variable values with the macro variables. The SYMPUT Routine: The DATA step provides functions and CALL routines that enable you to transfer information between an executing DATA step and the macro processor. SYMPUT routine creates a macro variable during execution time and assigns a value. General form:
Page 148 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The SYMGET Function: To obtain a macro variable's value during DATA step execution, use the SYMGET function. The SYMGET function returns the value of an existing macro variable. General form:
SYMGET (macro-variable)
Where macro-variable is the name of an existing macro variable. If quotes are not used it is considered as a variable & its value is substituted in its place.
Create a macro variable bikeclass and assign a value to it and print only those observations with the value of the macro variable. Also use a TITLE statement to display the value of the macro variable.
PROC PRINT DATA = models; WHERE Class = "&bikeclass"; TITLE "Current Models of &bikeclass Bicycles"; RUN;
Page 149 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Refer File Name: 22.1.sas to obtain soft copy of the program code
How It Works
If you are using macro variables, use only double-quotes and not single-quotes.
Problem Statement 2
A company maintains a dataset with information about every order they receive. For each order, the data include the customer ID number, date the order was placed, model name, and quantity ordered. Here is the data: 287 15OCT03 Delta Breeze 15 287 15OCT03 Santa Ana 274 16OCT03 Jet Stream 1 174 17OCT03 Santa Ana 174 17OCT03 Nor'easter 5 174 17OCT03 Scirocco 347 18OCT03 Mistral 1 287 21OCT03 Delta Breeze 30 287 21OCT03 Santa Ana
15 20 1
25
Every Monday the president of the company wants a detail-level report showing all the current orders. On Friday the president wants a report summarized by customer. Write a SAS program for the above requirement.
Code %MACRO reports; %IF &SYSDAY = Monday %THEN %DO; PROC PRINT DATA = orders; FORMAT OrderDate DATE7.; TITLE "&SYSDAY Report: Current Orders"; %END; %ELSE %IF &SYSDAY = Friday %THEN %DO; PROC MEANS DATA = orders; CLASS CustomerID; VAR Quantity; TITLE "&SYSDAY Report: Summary of Orders"; %END; RUN; %MEND reports;
Refer File Name: 22.2.sas to obtain soft copy of the program code
Page 150 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Summary
The macro facility is one of the most powerful features of base SAS. There are two main components of the SAS macro facility o o Macro variables Macro programs
The %LET statement enables you to define a macro variable and to assign a value to it There are two types of macro variables: o o automatic macro variables user-defined macro variables
% and & are considered as macro triggers SYMPUT routine to creates a macro variable during execution tine and assign a value The SYMGET function returns the value of an existing macro variable
Page 151 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Macro Programs
A macro is a group of SAS statements that is identified by a name. It is a larger piece of a program that can contain complex logic including complete DATA and PROC steps, macro statements and macro variables. General form of a macro:
%macro-name
Example:
%MACRO printit; PROC PRINT DATA = EMP (OBS = 10); TITLE CONTENTS OF DATASET EMP; RUN; %MEND printit; %printit PROC SORT DATA = EMP; BY EMPID; RUN; %printit
Page 152 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The program calls the macro twice; first without sorting the data, and then after executing a PROC SORT by Empid. The SAS statements inside the macro is substituted in the place %printit Macro variables Vs Macros: Macro Variables Starts with an ampersand (&) Defined using %LET statement Is like a standard data variable except that it does not belong to a data set and has only a single value which is always character Macros Starts with a percent sign (%) Defined using %MACRO and %MEND statements Is a larger piece of a program that can contain complex logic including complete DATA and PROC steps, macro statements and macro variables
Positional Parameter
The following example shows the positional style: Example:
%MACRO printit(dsname, noobs); PROC PRINT DATA = &dsname (OBS = &noobs); TITLE CONTENTS OF DATASET &dsname; RUN; %MEND printit;
To invoke the macro use the following syntax with the parameters substituted in the right position %printit(emp, 100)
Keyword Parameter
The following example shows the keyword style: Example:
%MACRO printit(dsname = &syslast, noobs = 100); PROC PRINT DATA = &dsname (OBS = &noobs); TITLE CONTENTS OF DATASET &dsname; RUN; %MEND;
Page 153 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Where, &syslast and 100 are the default arguments for dsname and noobs respectively. &syslast refers to the most recently created dataset. The above macro can be invoked in different ways.
%printit(noobs = 50)
Since the value of dsname is not provided, it takes the default parameter &syslast.
%printit()
Takes the default arguments for both the parameters. In Positional style, the parameters should be given in the same order as in the macro definition. But in Keyword style, the parameters can be given in the any order.
%LET A = HAI;
Page 154 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
When the %LET statements are placed in open code (outside of any DATA step or %MACRO definition) the variables they define are with GLOBAL scope To create a macro variable with GLOBAL scope inside a macro, use the %GLOBAL statement. Example:
%GLOBAL A;
Place this statement above the %LET statement.
System Options
SAS Log 110 where fee>&amount; SYMBOLGEN: Macro variable AMOUNT resolves to 975 111 A = &city; SYMBOLGEN: Macro variable CITY resolves to Dallas
Page 155 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
%MACRO printit(); PROC PRINT DATA = DEPT (OBS = 75); TITLE CONTENTS OF DATASET DEPT ; RUN; %MEND printit; OPTIONS MPRINT; %printit()
Log FILE:
101 %printit MPRINT(PRINTIT): proc print data= DEPT (obs=75); MPRINT(PRINTIT): title " CONTENTS OF DATASET DEPT"; MPRINT(PRINTIT): run; MLOGIC
The MLOGIC option prints messages that indicate macro actions that were taken during macro execution General form: OPTIONS MLOGIC | NOMLOGIC; Where, MLOGIC specifies that messages about macro actions are printed to the log during macro execution. NOMLOGIC is the default setting, and specifies that messages are not printed to the SAS log
Page 156 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Example:
107 %printit MLOGIC(PRINTIT): Beginning execution. NOTE: There were 1 observations read from the dataset WORK.EMP. NOTE: PROCEDURE PRINT used: real time 0.02 seconds cpu time 0.02 seconds MLOGIC(PRINTIT): Ending execution
All the options SYMBOLGEN, MPRINT and MLOGIC options are typically turned on for development and debugging purposes. Turned off when the application is in production mode.
%PUT statement
Another way of verifying the values of macro variables. The %PUT statement writes text and values of macro variables to the SAS log. General form: %PUT text; Where, text is any text string or macro variable. It may be used virtually anywhere in the program and it will write to the SAS Log, the values of user defined or system defined macro variables To print the values of macro variables using %PUT statement use Argument _ALL_ _AUTOMATIC_ _USER_ Result in SAS Log Lists the values of all macro variables Lists the values of all automatic macro variables Lists the values of all user-defined macro variables
Description Writes a message for the resolution of each macro variable Displays the SAS statements returned by the Macro Processor Traces the beginning/ending of macro execution and any parameter values assigned Prints the values of the macro variables and text specified
Page 157 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
%IF condition %THEN action; %ELSE %IF condition %THEN action; %ELSE action; %IF condition %THEN %DO; action; %END;
These statements are similar to the standard SAS IF statement. Each keyword starts with a % sign to differentiate it from the standard IF statement. These statements can only be used inside a macro. The conditions and actions can include other macro statements or even complete DATA and PROC steps. If there is multiple statements in an action block, use the %DO-%END block. %IF Vs IF statement: The following table lists the difference between Macro IF statement and standard IF statement. Macro %IF-%THEN-%ELSE statement is used only in a macro program executes during macro execution uses only macro variables in logical expressions and cannot refer to DATA step variables Determines the text/SAS statements to be copied to the input stack. In the below example, Standard IF-THEN-ELSE statement is used only in a DATA step program executes during DATA step execution uses DATA step variables & macro variables in logical expressions Determines the DATA step statement(s) to be executed.
IF the parameter is used as PRINT then, PROC PRINT code is substituted in the place of macro invocation (%reportit) ElSE PROC CONTENTS code is substituted.
Example:
%MACRO reportit(request); %IF &request = PRINT %THEN %DO; PROC PRINT DATA = EMP; RUN; %END; %ELSE %DO
Page 158 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS PROC CONTENTS DATA = EMP; RUN; %END; %MEND; %reportit Iterative processing in Macro
We can also use the Iterative processing in Macros using iterative %DO statements. These statements are similar to the standard SAS statements and are used to repeat a set of SAS statements specific number of times. The %DO statement has various forms %DO-%WHILE %DO-%UNTIL Iterative %DO Example:
%MACRO arrayme; %DO i = 1 %to 5; file&i %END; %MEND arrayme; DATA one; SET %arrayme; RUN;
The macro evaluates to the following during execution time: Example:
Page 159 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
%UPCASE ( argument );
Where, argument is a character string. Example:
%let date = 05JAN2002; %substr(&date,3,7) will return the value JAN2002. %substr(&date,3,3) will return the value JAN.; %LENGTH statement
Returns the length of the string. Example:
Page 160 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
General form:
Code OPTIONS SYMBOLGEN MLOGIC MPRINT; %macro testprnt ( data = &syslast , obs = 90 , tl = 3) ; proc print data = &data (obs=&obs) ; title&tl Contents of Dataset &data with &obs observations; run ; title&tl ; %mend testprnt ; %testprnt(data = all, obs = 100, tl = 5); / * macro reference */
Refer File Name: 23.1.sas to obtain soft copy of the program code
Page 161 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
How It Works
&syslast returns the most recently created dataset. title&tl - clears the title statement
Summary
A macro is a group of SAS statements that is identified by a name Parameters are values that are passed to the macro at the time of invocation Macro variables come in two varieties Local & Global SYMBOLGEN, MPRINT , MLOGIC, %PUT statements are used for debugging macro code. You can use macros to control conditional execution of statements We can also use the Iterative processing in Macros Macro character functions have the same basic syntax as the corresponding DATA step functions and they yield similar results.
Page 162 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Syntax Error:
Syntax errors occur when program statements do not conform to the rules of the SAS language. Examples of syntax errors include misspelling a SAS keyword Uninitialized variable Variable not found using unmatched quotation marks forgetting a semicolon specifying an invalid statement option Specifying an invalid data set option. Example: In the below program, DATA statement is misspelled, and SAS prints a warning message to the log. Program:
Syntax Error (misspelled key word) date temp; WARNING 14-169: Assuming the symbol DATA was misspelled as date.
Page 163 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS x=1; run; NOTE: The data set WORK.TEMP has 1 observations and 1 variables.
Because SAS could interpret the misspelled word, the program runs successfully and produces the output. SAS interprets the misspelled keywords only in some cases.
Data errors:
Missing values are generated when Data error occurs. Data error occurs during the following scenarios Numeric to character conversion Invalid data Character field truncated Data errors occur when some data values are not appropriate for the SAS statements that you have specified in the program. For example, if you define a variable as numeric and assigns a character value to it, SAS generates a data error. SAS detects data errors during program execution. When a data error is encountered, SAS does the following and continues to execute the program. Writes an invalid data note to the SAS log Prints the input line and column numbers that contain the invalid value in the SAS log. SAS prints a rule line above the observation Sets the automatic variable _ERROR_ to 1 for the current observation and continue the execution. Example Program:
DATA EMP; INPUT EMPID NAME $ SALARY ; DATALINES; 1000 RAJU 1000 1001 KUMAR $2,561.00 1002 ABISHEK 4586 ; RUN; DATA EMP;
Page 164 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Log file:
Logic Errors
Wrong result, but no error message
Determining Logic Errors: Use the DEBUG option in the DATA statement to help identify logic problems. The DEBUG option is an interactive interface to the DATA step during DATA step execution. This option is useful to determine Which piece of code is executing Which piece of code is not executing The current value of a particular variable When the value of a variable changes. General form of the DEBUG option:
Page 165 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
The DATA step is the most problematic part of SAS debugging. First rule of debugging o Always check the SAS log o Always start at the beginning For DATA step Debugging you can use, o o o PUT statements Automatic variables (_ALL_, _INFILE_) IN data set option
Dont limit DATA step debugging strictly to DATA step tools. Also use Procedures to Debug DATA Steps like, o FREQ o MEANS o PRINT o REPORT o CONTENTS o DATASETS If your program is well documented and aligned neatly, debugging is very easy.
data new; set old; where x > 10; run; proc means data=new; var x y z; run;
Here, a new dataset is created for the sole purpose of performing a procedure on a subset of data. Instead, use a where statement in the procedure to do this. Where statements can be used with all procedures.
Page 166 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Sub-setting data from one dataset into multiple datasets can be achieved in one data step instead of many.
The datasets procedure can perform many housekeeping operations on a dataset, including copying, deleting, and renaming datasets, renaming variables, adding labels or changing formats. It does these operations much more efficiently than using data step programming because, it modifies only the descriptor portion of the Dataset whereas, the DATA step reads all the data from the dataset. Store Data in SAS Datasets: Instead of storing data in a raw data file and reading it again and again, store the raw data file in a permanent SAS dataset for later use. SAS reads a Dataset faster than an external file. Keeping only the required variables: When inputting a flat file, input only the variables needed. When inputting a SAS dataset, use a KEEP statement to keep only the variables needed. (Note: DROP will work, but KEEP provides good documentation.) DROP intermediate variables used for calculations. Example:
Page 167 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Data work.dsn1; Set work.dsn2; length var1 $4. var2 $5. var3 6.2; run;
Use WHERE statement for conditional processing: Use the WHERE statement instead of the sub-setting IF statement to filter data, if the dataset is large. The WHERE statement filters the data before it gets loaded into the PDV whereas, the IF statement filters the data only after the data is loaded into the PDV. Inefficient Method Data work.dsn1 ; set work.dsn2 ; if Product = Sofa; run; Efficient Method Data work.dsn1 ; set work.dsn2 ; where Product = Sofa; run;
Use IF-THEN/ELSE instead of multiple IF statements: Use the IF-THEN / ELSE statement instead of a series of IF-THEN statements. IF-THEN / ELSE statement skips the remaining conditions, if a condition is met whereas, the separate IF-THEN statements checks all the conditions for all the observations.
Page 168 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Inefficient Method Data work.dsn1 ; set work.dsn2 ; if Product = Sofa then Discount=0.08; if Product = Bed then Discount=0.10; if Product = Chair then Discount=0.12; Run;
Efficient Method Data work.dsn1 ; set work.dsn2 ; if Product = Sofa then Discount=0.08; else if Product = Bed then Discount=0.10; else if Product = Chair then Discount=0.12; run;
IF/THEN/ELSE
When using a series of IF ... THEN ... ELSE ... statements, list the conditions in descending order of probability. This will save CPU time., Example:
IF YEAR LT THISYR THEN OUTPUT OUTOLD; ELSE IF YEAR EQ THISYR THEN OUTPUT OUTCUR; ELSE OUTPUT OUTBAD; SORT
Sort only the variables needed. It is faster. Example:
Page 169 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Place the selection criteria in the right position: Imply the selection criteria first on the columns to delete unwanted observations before reading or processing rest of the fields. Inefficient Method Data work.dsn1 ; Set work.dsn2 ; Discount= ( Price * 0.04) ; Profit = ( Price * 0.10 ); if Product = Computer ; run; run; Efficient Method Data work.dsn1 ; Set work.dsn2 ; if Product = Computer ; Discount= ( Price * 0.04) ; Profit = ( Price * 0.10 );
Use a subset of data for testing codes: For testing a piece of SAS code on a large dataset use a part of the dataset using OBS= or OUTOBS= options rather than using the whole dataset. Date work.dsn1; set work.dsn2 ( obs=1000); A=mean(salary); run; Proc sql outobs=1000; create table work.dsn1 as select mean( salary) as A from work.ds2; Quit;
Compressing large Datasets: Use the COMPRESS= option while creating large datasets to store the datasets in compressed format. Use OPTIONS COMPRESS=YES; statement at the beginning of any SAS codes. Index the variables used for conditional processing: Create index on key columns or columns used for conditional processing i.e., columns used by WHERE or IF statements. Searching is faster if the column is indexed Index the variables used for conditional processing: Create index on key column, columns which are used for conditional processing i.e., columns used by WHERE or IF statements. Delete Unneeded Datasets: At the end of the program or at strategic points, it is a good practice to use PROC DATASETS to delete unneeded data sets from the work or permanent library. This will make room for the new datasets. This not only will improve performance, but more importantly will show the intention to the reader as well.
Page 170 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
Consolidate program steps using Proc SQL: Consolidate programming steps using the SQL procedure in order to save process time and resources. Inefficient Data work.dsn1; Set work.dsn2; Run; Proc sort data=work.dsn1; By products; Run; Efficient Proc sql; Create table work.dsn2 as Select * from work.dsn1 Order by products; Quit;
Summary
Errors are classified into: o o o Syntax Error Data Error Logic Error
First rule of debugging: o Always check the SAS log o Always start at the beginning Minimizing the use of the following resources generally characterizes programming efficiency: o o o o o CPU time I/O time Memory Data storage Programming time
Page 171 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
References
Websites
www.sas.com www.support.sas.com http://v8doc.sas.com/sashtml/ http://support.sas.com/onlinedoc/913/docMainpage.jsp SUGI Papers http://www2.sas.com/proceedings/sugi30/toc.html http://www.SierraInformation.com http://www.cpc.unc.edu/services/computer/presentations/sasclass99 http://www.sasforum.co.nr/ http://www.ats.ucla.edu/stat/sas/ http://www.datasavantconsulting.com/roland/sastips.html http://en.wikipedia.org/wiki/SAS_System#Early_history_of_SAS http://www.nber.org/~veronica/sastips.htm http://www.ats.ucla.edu/STAT/sas/library/nesug00/bt3005.pdf
Books
SAS Programming by Example - By Ron Cody & Ray Pass The Little SAS Book: A Primer, Third Edition - By Lora D. Delwiche & Susan J. Slaughter SAS Certification Prep Guide: Base Programming for SAS9 - By SAS Publishing SAS Certification: Advanced Programming - By SAS Publishing SAS Macro Programming Made Easy, Second Edition - By Michele M. Burlew PROC SQL: Beyond the Basics Using SAS - By Kirk Paul Lafler SAS For Dummies by Stephen McDaniel & Chris Hemedinger Learning SAS by Example: A Programmer's Guide by Ron Cody
Page 172 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected
Handout - SAS
STUDENT NOTES:
Page 173 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected