Smith College Department of Mathematics and Statistics

Statistical Data Analysis Tools

Using SAS at Smith

Contents

Availability
Starting
Reading data
Basic statistics
Graphics
Regression
Summary of commands
Useful links

Introduction

This is a tutorial for using SAS. It is a multifaceted package provided by SAS Institute that is widely used by business and academic institutions.

We will demonstrate basic functionality of the program using a data set of crime rates in the U.S. Analyses in this example includes calculations of summary statistics, recoding of data, graphical displays and estimation of multiple regression models.

Availability

SAS is installed on florence.smith.edu (a Scinix/Linux machine), and can be accessed using ssh (secure shell) from a Scinix/Linux machine. Mac users can also access the program through Terminal (within the Utilities folder). New users may need to request their account be activated (email cats@smith.edu).

Starting SAS

On a Mac, SAS can be opened by selecting the Terminal application from the Utilities folder. Then access your scinix account with the following command:

ssh -X username@florence.smith.edu

You may be asked "Are your sure you want to continue connecting?", and you should type yes. You will also be asked for your password; as you type this password, the cursor will not move - this is normal.

After logging in the system, run the following command at the shell prompt:

/opt/SAS/SASFoundation/9.2/sas

After a few seconds of startup, you should see a welcome message:



The SAS graphical user interface has four main sections: "Log" keeps track of the past commands and any error messages; "Results" shows all the results run; "Output-projectname" displays the output; and "Program Editor" allows the user to enter commands.

Reading data

Data files in native SAS (.sas7bdat) format can be opened by File --> Open. Data can also be imported in comma separated values (csv) format by File --> Import data.

SAS can also load data sets remotely, via the internet. To load a data set in this manner, enter the following within the "Program Editor" window:

filename myurl url 'http://www.math.smith.edu/tutorial/crime.csv';
proc import datafile=myurl out=ds dbms=dlm;
   delimiter=",";
   getnames=yes;
run;

To actually submit this code, select Run from the menu of the "Program Editor" window, then Submit.

We can view the values of this data set by keying in the command:

proc print data=ds;
run;

after we successfully loading the data.


Basic Statistics

The command

proc means data=ds mean min q1 median q3 max;
   var murder urbanpop assault rape;
run;

will have SAS produce the five-number summary of statistics.


Graphics

SAS can provide many types of statistical graphs under the specified commands. One common graph is a histogram, which allows us to view the distribution of a variable. Here, we want to generate a histogram for the variable Murder, and fit a Normal curve to the data. This graph can be generated by the following command:

proc univariate data=ds;
   var murder;
   histogram murder/normal(color=black 1=1);
run; quit;

A two-way scatter plot of Murder against Urbanpop can be produced by entering the command

proc gplot data=ds;
   plot murder * urbanpop;
run;
quit;


Linear regression

To fit a multiple linear regression model with Murder as the outcome and UrbanPop, Assault and Rape as predictors, we use the following command:


proc glm data=ds;
   model murder=urbanPop assault rape;
   output out=ds residual=resid_murder predicted=pred_murder; run;


A table of results will be generated, and has been reproduced:


In the above command for regression, we saved the fitted values and residuals, to be used for diagnostic analysis of the model. Below, we display the QQ-plot and histogram of residuals as well as a residual-versus-fit plot.

QQ-Plot

proc univariate data=ds;
   qqplot resid_murder/normal(mu=est sigma=est color=black);
run;

Histogram of Residuals

proc univariate data=ds;
   histogram resid_murder/normal(mu=est sigma=est color=black);
run;

Residual-versus-Fit Plot

proc gplot data=ds;
   plot resid_murder * pred_murder;
run;
quit;

If you'd like to explore more, visit the UCLA ATS SAS page.

Summary of commands

options ls=70 nocenter;
filename myurl url 'http://www.math.smith.edu/tutorial/crime.csv';
proc import datafile=myurl out=ds dbms=dlm;
   delimiter=",";
   getnames=yes;
run;
proc print data=ds;
run;

proc means data=ds min q1 mean median q3 max;
   var murder assault urbanpop rape;
run;

proc univariate data=ds;
   var murder;
   histogram murder/normal(mu=est sigma=est color=black);
run;

proc gplot data=ds;
   plot murder * urbanpop;
run;

proc glm data=ds;
   model murder = urbanpop assault rape;
   output out=ds residual=resid_murder predicted=pred_murder;
run;

proc univariate data=ds;
   var resid_murder;
   qqplot resid_murder/normal(mu=est sigma=est color=black);
   histogram resid_murder/normal(mu=est sigma=est color=black);
run;quit;

proc gplot data=ds;
   plot resid_murder * pred_murder;
run; quit;
SAS Website
Resources to help you learn and use SAS
UMass SAS Online Tutorial
PennState SAS Tutorial

Created by Zehui Chen and Nicholas Horton, October 14, 2010.
Updated by Sarah Anoke, July 26, 2011.