Making charts and graphs in SAS is easy!

Charts and graphs are effective ways to represent data to an audience.

A pie chart is best for showing the proportions of occurrence of the options in a nominal level variable.

Code

**PROC GCHART** DATA=slid;

PIE language/ DISCRETE;

VALUE = INSIDE PERCENT = INSIDE SLICE=OUTSIDE;

**RUN**;

We use the PROC (procedure) GCHART of the DATA slid to make a PIE graph of the variable language. The VALUE represented in the graph are PERCENT INSIDE each SLICE. We then end with the RUN command.

Output

The pie chart shows us that the overwhelming majority of observations reported speaking English, the next largest group is other languages, and the smallest French. This a quick and effective way to share the percent of observations compared with others. It iis often that publications do not accept color charts, so using patterns instead are also effective.

Histograms are best to plot continuous level variables because, as the name suggests, the values are on a continuum. Histograms are very helpful for investigating the distribution of continuous variables which is important for determining if a variable needs to be recoded.

Code

**PROC SGPLOT** DATA= SLID;

HISTOGRAM age;

**RUN**;

We are doing the PROC (procedure) SGPLOT of the DATA SLID to create a HISTOGRAM of the variable age. We then end with the RUN command.

Output

The histogram shows us the range of ages among the observations and the frequency of occurrence. We can also see that the distribution of age does not follow a normal curve and is skewed to the right. This may effect our results of our earlier statistical tests. Sas reports the percent of frequencies of the whole dataset, rather than raw counts.

Boxplots, often called box-and-whisker plots and are used to represent the quartiles of continuous level variables. Boxplots display the variation in the sample with boxes that represent the quartiles and 'whiskers' of observations outside the upper and lower quartiles. These plots can be done with a single variable or multiple variables, as we will see below.

Code

**PROC SGPLOT** DATA= SLID;

VBOX age;

**RUN**;

We are doing the PROC (procedure) SGPLOT of the DATA SLID to create a VBOX of the variable age. We then end with the RUN command.

Output

The box plot below shows us the median (just above 40) of the variable `age` with a horizontal line inside the blue box. The top and bottom edges of the blue box are the 25 (Q1) and 75 (Q3) quartiles of the distribution. Next, the whiskers are the minimum and maximum values recorded for `age `of the observations.

Code

**PROC SGPLOT** DATA= SLID;

VBOX age/ category=sex;

**RUN**;

We are doing the PROC (procedure) SGPLOT of the DATA SLID to create a VBOX of the variable age by the categories of the variable sex. We then end with the RUN command.

Output

This box plot is separated by the sex of the observations (Female and Male). This helps us to see the distribution of age by sex.

Bar charts are bested used to represent ordinal level variables to show the distribution of the options. We can graph a bar chart of a single variable or multiple variables for a direct comparison.

Code

**PROC GCHART** DATA = SLID;

VBAR language/DISCRETE;

**RUN**;

We are doing the PROC (procedure) GCHART of the DATA SLID to create a VBAR of the variable language. We then end with the RUN command.

Output

The bar chart above shows the raw count of observations of the variable sex broken up by the observations. We can clearly see that there are more females than males in the dataset, but this difference is not great. Using the results from this bar chart we could ask ourselves "Is there a statistically significant difference between females and males across language, education, or wages?".

Code

**PROC SGPLOT** DATA=SLID;

VBAR language / STAT=sum GROUP=sex

GROUPDISPLAY=cluster;

**RUN**;

**We are doing the PROC (procedure) SGPLOT of the DATA SLID to create a VBAR (bar chart) of the variable language and the variable sex. We then end with the RUN command. **

Output

We have broken the observations by sex (female and male) and the mean of age within each group.

Scatter plots are best used to graphically show if there is a relationship between two variables and what that relationship may look like.

Code

**PROC SGPLOT** DATA = SLID;

scatter y = wages x = education;

**RUN**;

**We are doing the PROC (procedure) SGPLOT of the DATA SLID to create a scatter plot, where the y-axis is the variable wages and the x-axis is the variable education. We then end with the RUN command.**

Output

Above is a scatter plot of the variables education by wages. That is, the points in this graph are the values of education relative to wages. Scatter plots are very helpful when examining continuous level variables and if a graphical relationship exists. We can see in this scatter plot that there is some clustering of observations when educations is 15 and wages is 10. This suggests there may be some relationship that exists. After looking as this graph, we would next want to conduct statistical tests to see if the relationships is statically significant.

- Last Updated: Jul 22, 2024 9:44 AM
- URL: https://research.library.gsu.edu/sas
- Print Page