Using SAS at Smith
Contents
AvailabilityStarting
Reading data
Basic statistics
Graphics
Regression
Summary of commands
Useful links
Introduction
This is a tutorial for using SAS. It is a multifaceted package provided by SAS Institute that is widely used by business and academic institutions.
We will demonstrate basic functionality of the program using a data set of crime rates in the U.S. Analyses in this example includes calculations of summary statistics, recoding of data, graphical displays and estimation of multiple regression models.
Availability
SAS is installed on florence.smith.edu (a Scinix/Linux machine), and can be accessed using ssh (secure shell) from a Scinix/Linux machine. Mac users can also access the program through Terminal (within the Utilities folder). New users may need to request their account be activated (email cats@smith.edu).
Starting SAS
On a Mac, SAS can be opened by selecting the Terminal application from the Utilities folder. Then access your scinix account with the following command:
ssh -X username@florence.smith.edu
You may be asked "Are your sure you want to continue connecting?", and you should type yes. You will also be asked for your password; as you type this password, the cursor will not move - this is normal.
After logging in the system, run the following command at the shell prompt:
/opt/SAS/SASFoundation/9.2/sas
After a few seconds of startup, you should see a welcome message:
The SAS graphical user interface has four main sections: "Log" keeps track of the past commands and any error messages; "Results" shows all the results run; "Output-projectname" displays the output; and "Program Editor" allows the user to enter commands.
Reading data
Data files in native SAS (.sas7bdat) format can be opened by File --> Open. Data can also be imported in comma separated values (csv) format by File --> Import data.
SAS can also load data sets remotely, via the internet. To load a data set in this manner, enter the following within the "Program Editor" window:
filename myurl url 'http://www.math.smith.edu/tutorial/crime.csv';
proc import datafile=myurl out=ds dbms=dlm;
delimiter=",";
getnames=yes;
run;
To actually submit this code, select Run from the menu of the "Program Editor" window, then Submit.
We can view the values of this data set by keying in the command:
proc print data=ds;
run;
after we successfully loading the data.

Basic Statistics
The command
proc means data=ds mean min q1 median q3 max;
var murder urbanpop assault rape;
run;
will have SAS produce the five-number summary of statistics.

Graphics
SAS can provide many types of statistical graphs under the specified commands. One common graph is a histogram, which allows us to view the distribution of a variable. Here, we want to generate a histogram for the variable Murder, and fit a Normal curve to the data. This graph can be generated by the following command:
proc univariate data=ds;
var murder;
histogram murder/normal(color=black 1=1);
run; quit;

A two-way scatter plot of Murder against Urbanpop can be produced by entering the command
proc gplot data=ds;
plot murder * urbanpop;
run;
quit;

Linear regression
To fit a multiple linear regression model with Murder as the outcome and UrbanPop, Assault and Rape as predictors, we use the following command:
proc glm data=ds;
model murder=urbanPop assault rape;
output out=ds residual=resid_murder predicted=pred_murder;
run;
A table of results will be generated, and has been reproduced:

In the above command for regression, we saved the fitted values and residuals, to be used for diagnostic analysis of the model. Below, we display the QQ-plot and histogram of residuals as well as a residual-versus-fit plot.
QQ-Plot
proc univariate data=ds;
qqplot resid_murder/normal(mu=est sigma=est color=black);
run;

Histogram of Residuals
proc univariate data=ds;
histogram resid_murder/normal(mu=est sigma=est color=black);
run;

Residual-versus-Fit Plot
proc gplot data=ds;
plot resid_murder * pred_murder;
run;
quit;

If you'd like to explore more, visit the UCLA ATS SAS page.
Summary of commands
options ls=70 nocenter;
filename myurl url 'http://www.math.smith.edu/tutorial/crime.csv';
proc import datafile=myurl out=ds dbms=dlm;
delimiter=",";
getnames=yes;
run;
proc print data=ds;
run;
proc means data=ds min q1 mean median q3 max;
var murder assault urbanpop rape;
run;
proc univariate data=ds;
var murder;
histogram murder/normal(mu=est sigma=est color=black);
run;
proc gplot data=ds;
plot murder * urbanpop;
run;
proc glm data=ds;
model murder = urbanpop assault rape;
output out=ds residual=resid_murder predicted=pred_murder;
run;
proc univariate data=ds;
var resid_murder;
qqplot resid_murder/normal(mu=est sigma=est color=black);
histogram resid_murder/normal(mu=est sigma=est color=black);
run;quit;
proc gplot data=ds;
plot resid_murder * pred_murder;
run; quit;
Useful links
SAS WebsiteResources to help you learn and use SAS
UMass SAS Online Tutorial
PennState SAS Tutorial