SPSS is a widely used statistical package in social science. Many of you will have used it before. This worksheet gives the basics for those who have not, using version 15 of
SPSS (the computers will regularly be updated, but the commands should mostly be the same...).
SPSS has different-numbered versions. For this course, we will use
whatever version is installed!. The computers in G37 have at least one version & may have some older versions as well. All the versions work similarly, & you should easily be able to use an earlier version once you know the latest one. Be aware that the different versions all use the same kind of data file (& syntax file, which you'll learn about in week 3) but they cannot read each others' output files. This is not a problem because you can transfer your work between different versions using data & syntax files.
SPSS,, log on to
Windows, click the big round button
All Programs |
SPSS Inc |
PASW Statistics XX |
PASW Statistics XX.
You'll see a starting box
What would you like to do? with various options. You will be entering new data, not reading from a file, so select
Type in data, click
OK & the
SPSS Data Editor screen opens.
If you have never used
SPSS (or have, but want to refresh your memory) it is worth looking over the elements of the
SPSS screen & menus before using it. If you are experienced with
SPSS, go straight to section 3.
When you start
SPSS, the first window that is opened for you is the
Data Editor: A spreadsheet-like window for defining & entering data. This also includes the
Application Window, which has a menu bar & tool bar at the top, underneath the title bar, & a status bar along the bottom. Other windows may be opened for you as appropriate during processing, e.g., an
SPSS is menu driven; that is, most of the features can be accessed by selections from the menus. The main menu bar contains ten items, as follows:
File: Create & open files (either
SPSSfiles or from other software packages)
Edit: Modify, copy, cut, paste text, insert variables, etc
View: Display toolbars & status bar, change fonts, & display gridlines & value labels
Data: Sorting, merging, selecting, & weighting cases in the current data file
Transform: Change selected variables, recode & compute new variables
Analyze: Select statistical procedures to analyse your data
Direct Marketing: huh?
Graphs: Create a wide range of graphs from your data
Utilities: Display variables, list variable information, define sets of variables for analysis, run scripts, & edit menus
Add-onsHelp IBM make more money from SPSS products
Window: Move between, arrange, select, & control the various
Help: Get help about any feature within
SPSSHelp is varied & extensive - you will find some guidance on it in the last section of this handout.
Look at each menu in turn. Note which items appear under each menu. The items listed in bold are those available or appropriate at this point.
SPSSfor Windows Tool Bar
The tool bar underneath the menu bar allows you to access many of the most commonly used features. To find out what each of the options along the tool bar does, place the cursor over the tool for a moment & look at the pop-up box, where a short description will be displayed.
SPSS is an extensive system & you will often want to find help or information that is not in my handouts. The last section of this handout gives guidance on using online help, & recommends some books.
The table below shows part of the results of a survey about workers. Each row (
case) contains information (seven variables) about a single person. The variables & their values are defined below.
Which type of variables are these?
And, are these variables?
SPSS, you define variables & enter data in the
Data Editor. The
Data Editor is a window that looks like a spreadsheet, with an array of cells in which you enter data. Unlike a spreadsheet, you can only enter data, not formulae.
Data Editor is illustrated below.
Data Editor has two views,
Data View &
Variable View (see tabs at bottom left of screen). The illustration below shows the
This empty window is called
Untitled1 [Dataset0]. Versions 14 onwards of
SPSS allow you to have more than one
Data Editor window open (
Data View, rows are
cases (a participant or other unit of observation ), & columns are
variables. The variables are unnamed by default, labelled simply
Workers Survey data have seven variables, so you will use seven columns, naming them after each variable
Variable View of the
Data Editor is where you define the characteristics of your variables. It is possible to enter data without naming variables, but it is better, & avoids errors, if you start defining the variables first. To do so, click the
Variable View tab at bottom left, & you see this:
Variable View, each row corresponds to a variable. In each row you can define the name, type, & other characteristics of the variable.
Type the names of the first two variables from the data set,
ethnicgp, in the first two rows under
Name. Variable names can be in lowercase or capitals (lowercase is preferable). We have used variable names that are short, no more than eight characters. After each variable name, press
Enter. The screen will now look as follows:
SPSS inserts the default variable characteristics each time you type in a new variable name. By default, it assumes variables will be of the type
numeric, have a maximum
Width of eight digits including decimals, & two decimal places. Other parameters are undefined by default.
For the moment, return to
Data View, & you will see the two new variable names at the top of the first two columns.
Paste) in the
ethnicgp data from the 12 subjects, copying from the table above. To correct any errors, select the cell & re-type.
SPSS automatically gives the data two decimal places, & allows up to eight digits, as specified in the variable characteristics.
You can leave the
Decimal Places as they are, but it's neater to change them. Return to
Variable View & select the
Type cell for either of the variables (in the illustration above it's selected for
ethnicgp). A small button appears at the right-hand end of the cell as shown. Click this & you see a
Variable Type dialogue box:
Change the number of
Decimal Places to zero. Change the
Width to the maximum number of digits needed for values
OK. Repeat the process for the variable
You see that this box allows you to do other things, e.g., change variables from
numeric to other types (note that
numeric here does not imply that the data are interval data...). We recommend that you generally make your variables numeric, because
SPSS statistical procedures don't always work well with other types.
Data View to see the effect of the changes.
Before entering more data, go back to
Variable View, type in the names of the remaining variables (
sex, etc.), & define the
Decimal Places. Instead of the
Type, you can click on
Decimal & change the numbers there. (For any continuous variables, leave
Decimals set to two rather than changing it to zero.).
SPSS allows you to label your variables, which often makes the output of analyses easier to understand. There are two types:
Variable labels, &
Variable label is simply a more detailed description of what that variable is (e.g., for the variable
yrs you might wish to add the variable label
Number of years working for this company. (Alternatively, you can use longer variable names, but these can become difficult to work with).
Value labels are often helpful for a variable which has a small number of values (a discrete variable); for example
ethnicgp, whose values are 1, 2, 3, & 4. By adding value labels, you record that value "1" has label "White", value "2" has label "Asian", etc.
Variable View. If you want to provide a
Variable label for
yrs, click in the
Label column opposite this variable, type in the description & press
Value labels, as noted above, are often used for discrete variables, especially where it is not obvious what the values stand for. These may be categorical or ordinal variables.
To define value labels for
ethnicgp, click in the
Values cell of that variable, & a button appears (as shown below). Click the button to obtain the
Value Labels dialogue box.
To add the first
Value Labeltext box, type "White"
And add the rest of the value labels...
Note: Value 0 indicates a subject whose ethnic group is not known. Such subjects must be omitted from any analyses that use the variable
ethnicgp. In the next section you will define 0 as a
Missing Value for
ethnicgp. So it is not essential to provide a label for value 0, although it may be helpful. Occasionally, it is important to label missing values, e.g., when there is more than one type of missing value.
When you have labelled all the values you want to, click
Repeat the labelling procedure for any other variables in the data set that you wish. They do not all require
Shortcut: The 2
satis variables have identical value labels. To save unnecessary typing, you can define the
Value Labels for one of them, &
Copy these for the other one. Define the labels for
satis1, then select its
Labels cell, click
Copy on the menu, select the
Labels cell for
satis2, & click
Paste. The same labels will be inserted.
After you have defined
Value Labels, you can make
SPSS display either the labels or the numeric values in the
Data Editor. To control this (& to toggle on/off), click on
It is essential to tell
SPSS what values of a variable correspond to missing data, e.g., where data were lost, or a participant refused to reply. This will ensure that
SPSS omits such cases from any analyses that use that variable. If it tried to include them, the analysis would not make sense.
SPSS has a standard missing value code (.) for numeric data values that are missing: This is known as a
System Missing value. However, it is often helpful to specify
user missing values, as follows.
Looking at the variable descriptions in Section 3, you see that, for many variables (e.g.,
ethnicgp, 0 signifies a missing value:
Not Known), but, for one variable (
daysabs), a special value 999 signifies missing values. A value of 0 would not be appropriate because it's a possible real value: A person could be absent from work for 0 days in a year. To convey this information to
Variable View, click on the
Missing cell for the variable & click the button to produce the
Missing Values dialogue box.
This provides options for defining missing values. By default,
SPSS treats all values as valid (not missing) so the initial setting is
No missing values. You can declare up to three separate, discrete values, & specify these as missing, or a range of missing values.
SPSS allows up to three missing values because, e.g., in survey research there might be different reasons for missing values (e.g., refused, not applicable, don't know), & you might want to distinguish these. All of these responses can be given different numeric codes, &
Value Labels, & then defined as missing values, allowing these responses to be treated differently in the subsequent analysis.
Workers Survey data set, some variables have 0 as the missing-value or a
No Response code, &
daysabs has 999 for missing values.
To specify this as a missing value to
Missingcell for variable
daysabs& click the button
Discrete missing valuesradio button
Specify 0 as a missing value for
ethnicgp, using the above procedure.
Some other variables have 0 specified as a missing value, so insert these as well. Be careful not to use 0 as a missing value if it is a possible real value for that variable
It is good practice to define suitable missing values for all variables. (There may not be missing data among these 12 subjects, but you might add other subjects later who do have missing data.) So follow the equivalent steps for all the variables in the data set.
Displaying Variable Information. To check the characteristics of any of the variables on your file, look at the
Variable View screen, or go to
Data View, select
Variables. This displays a dialogue box of all the variables you have specified. To choose one of the variables, click on the variable name on the left hand side of the dialogue box, so that it is highlighted. Information about that variable is then displayed on the right hand side of the dialogue box.
If you move or scroll over to the right side of the
Variable View screen, you find three more characteristics not yet discussed:
Columns is optional (it sets the maximal width of the column, e.g. if you need more space to display the variable name).
Align can be left as it is.
Measure should be looked at. There are three options: You can specify the level of measurement of the variable as
scale (i.e., interval- or ratio-scale numeric data),
nominal. Click on each
Measure cell, & enter a choice. Insert the appropriate values for each variable in this column. If you don't enter these,
SPSS will "guess" which levels of measurement your variables have, and will not necessarily guess correctly. It is useful to record the correct information about your variable, because the
SPSS help uses this terminology in explanations, & the graph-drawing procedures expect it.
Save your data file to your drive:
SPSSdata files have the default extension
Menu Bar option
Analyze offers a wide range of procedures. We will first look at the
Frequencies command, which comes under
Frequencies is a useful general-purpose procedure, which produces selected descriptive statistics (central tendency, dispersion, skewness, & kurtosis) & charts (e.g., bar charts) for individual variables. You must decide what are the suitable descriptives or charts for the variable chosen. In this section, we illustrate its use for
Descriptive Statistics |
On the left side of the dialogue box is a list of all your variables. In this example, the variables are listed in the same order as in the
Data Editor. They may appear differently on different machines (e.g., in alphabetical order). In the list, highlight
ethnicgp. To highlight more than one item, click the first, then hold down the
Ctrl key while clicking others.
Why are sex and ethnicgp the most suitable variables for this procedure?
Click on the right-pointing arrow , which transfers the highlighted variables to the list on the right. Now click on the
Charts button and select
Bar Chart. Click
Continue, which returns you to the
Frequencies dialogue box.
Statistics button which allows you to display various summary statistics. Some of these (e.g., medians, means) only make sense for truly numeric (ordinal or interval) variables, & not for categorical variables such as
sex. Others (e.g., quartiles) are applicable to distributions of any variable, but they are of little use with this very small sample. So leave all statistics un-selected, & click
Continue. To run the
Frequencies procedure, click
The output of the procedure, including the chart, will be written to the
Output1 - SPSS viewer window.
Viewer) window will simply append output from any process you run. The window is divided into two panes: the left pane contains an outline view of the output contents (rather similar to
Windows Explorer), while the right pane contains any statistical output, charts, or tables you generated during your
SPSS session. You can scroll up and down the
Output window, expand or move it, or edit it, using the
Edit menu functions. Or you can click on the outline headers in the left pane to jump straight to a specified section of output. Pressing
Delete will delete the section currently highlighted in the outline.
Look at the output from the
Frequencies procedure. One subject has a missing value for variable
ethnicgp. How does the output table show that there is a missing-value case?
Can you tell from the bar chart that there is a missing-value case? How can you tell?
Frequencies command again, but this time choose a variable for which it makes sense to calculate a mean or median: A truly numeric variable. It can be ordinal or interval-level, but preferably a continuous variable, to provide a clearer contrast with the previous example.
ethnicgp from the right-hand window by highlighting them & clicking the left-pointing arrow. Then move a numeric variable from the left to the right window.
Statistics & choose an appropriate measure of central tendency for the variable:
Mean or median?
Look at the other possibilities in this dialogue box, & select some others that might be relevant. Click
Charts. This time, choose a
Histogram. It is also useful to select
With normal curve. Click
OK to run the procedure, and compare this output with the preceding one.
How does a histogram differ from a bar chart?
What does the
normal curve show?
Frequencies procedure calculated the mean over all subjects. But often you want to obtain the means, or other summary statistics, separately for subgroups of subjects (e.g., males vs. females). Subgroups can be defined by a categorical, or any other discrete, variable. In this case it would be
Compare means allows you to do this.
Menu Bar, select
Compare Means |
Independent variable (i.e., the one which defines the subgroups). In the
Dependent list, put all suitable variables (i.e., all the variables for which it makes sense to calculate a mean or median value). One variable is definitely not suitable: Which?
Options which shows that, by default, the mean, number of cases & standard deviation will be computed for each variable. You can select other statistics by moving them from the left- to the right-hand list. Select the
Median (because some of the variables may be ordinal), & any others you want. Click
Note: if you defined
Value Labels for the two values of
sex, these labels appear in the output, making it much easier to understand.
Now compute the median value of any one (or more) of the suitable variables separately for each
ethnicgp, & inspect the results.
What happens when a person's ethnic group is not known (missing)?
Note: in real life, & in future classes, you may go on to compute statistical tests of whether, for example, males & females have significantly different means on some numeric variable. It is highly advisable, before running such tests, to compute descriptive statistics for each of the groups, in the way illustrated here. By doing this you can ensure that all relevant cases have been included, missing values have been correctly ignored, & you can see how the means & medians differ from each other, which may not always be obvious from the statistical test output.
Finally, choose two variables for which you might wish to display a scatterplot, so that you can view any relationship between them &, if appropriate, test for a correlation.
In this week's class we discussed what types of variable are suitable for correlation testing. Correlations test for monotonic relationships (where the direction of change between levels of the variables is constant). If there is a relationship but it's non-monotonic, a correlation test may not be appropriate.
Choose a pair of variables for which it is appropriate to draw a scatterplot and, perhaps, test a correlation (hint: only four of the variables in the dataset are suitable).
If testing a correlation between these two variables, is there any reason to prefer a parametric or nonparametric test? (Hint: What is the scale of measurement of each of the variables?)
You can use the
Graphs option on the menu to produce many types of chart, including scatterplots. For this to work optimally, the
Measure column for each of your variables should be defined as
To draw a simple scatterplot of your two variables, select
Chart Builder |
ScatterPlot (alternatively, use the
The next dialogue box offers many options. The only essential thing is to choose which variables are plotted on the
Y (vertical) &
X (horizontal) axes, so drag your two variables into the appropriate spaces. There are many other options (e.g., to add a title or vary the legend symbols). Ignore these for the moment. Click
OK to display the plot.
Does the plot indicate any monotonic relationship between your variables? If so, is the apparent correlation positive or negative?
If you decide to try a different pair of variables before testing the correlation, do so.
To test the correlation, select
Insert the two variables for which you drew the scatterplot, into the
Variables box. Correlations can be positive or negative, so you can test a directional hypothesis. Thus, you can choose between two-tailed & one-tailed significance tests. Two-tailed is more usual; a one-tailed test is only relevant if you have a directional hypothesis, & if the correlation is in the specified direction (next week's class will discuss one-tailed tests). For this example, select two-tailed.
You can choose any or all of three tests: Pearson's r (parametric), Kendall's tau-b, Spearman's rho (both nonparametric). For this example, select
Pearson's & Spearman's. The dialogue box offers other possibilities, try them if you like. Click
OK and look at the output.
There are two output tables, Pearson's (called
Correlations) & Spearman's (
Nonparametric Correlations). Each cell of the table shows the correlation coefficient, & below it
Sig (2-tailed) (i.e., the 2-tailed p-value), & N (sample size after omitting missing values).
Why are some of the entries in the table
1.00, with no
What is the Pearson correlation between your two variables, its N & p?
What are the equivalent Spearman correlation statistics?
Is the direction of correlation (positive or negative) as expected from the scatterplot?
Is either p-value significant (i.e., p<=.05)?
How closely do the Pearson and Spearman results agree?
If you are familiar with 1-tailed tests, you may know that (provided the test is appropriate) it is "easier" to obtain significance with a 1-tailed than a 2-tailed test.
To see how this is apparent in the
SPSS output, run the same procedure again, but select
1-tailed instead of
The correlation coefficients and Ns should be the same, but the p-values are labelled
Sig (1-tailed) & are different from before.
How can you tell that the 1-tailed tests are "closer to significance" than the 2-tailed ones?
You may have noticed that the 1-tailed p-values are approximately half the 2-tailed ones. This is no accident. It should become clear why when we discuss 1-tailed tests in next week's class.
You can use the
Correlate procedure to make a table or matrix of correlations between all suitable variables, as follows.
Run the procedure again. Insert all 4 suitable variables into the box. Select only
Inspect the matrix. Note any correlations that are significant at p≤.05, 2-tailed, & note the relevant details (i.e., the variables, the correlation statistic (including the + or -), N & the p value.
SPSS shows exact p-values, we can see which effects "just miss significance" (i.e., have a p-value which is only just greater than .05), & which effects are a long way from significance (much larger p-values). As we will discuss in next week's class, it is often useful to know which effects are marginally significant (i.e., have a p-value which is >.05 but ≤.10).
One correlation is marginally significant. Which?
All other correlations in the table have ps>.10 (i.e., they are a long way from 2-tailed significance).
You can save the whole output as a file for later use, & you can print them.
Output files are quite large, especially if they contain charts, so, before saving or printing, check through to decide whether you need it all. To remove a section, click on the section of output, or on its label in the left-pane
Outline. Then either press
Delete, or use
Cut, which allows you to restore it (with
Paste) if you change your mind.
Save your output to disk.
SPSS automatically gives the filename an extension of
.spv, identifying it as an
Output Viewer file. In
Windows Explorer, you may not see the extensions
.spo, but they will have different icons, & may be labelled
SPSS Data Document &
SPSS Viewer Document respectively.
Note: The output file is in a format specific to
SPSS, & the file can only be read into
SPSS (& only to the version of
SPSS that created the output. It cannot read directly into a word-processing package. To read a previously saved output back into the
Output Viewer window of
You can copy & paste sections of output, including charts, from
SPSS into word-processing documents. There's more on this later in the course.
If you restart
SPSS, the opening dialogue gives you the choice of typing in new data, or opening an existing data file (
Open an existing data source, plus a list of recently used data files. Find your file...).
To read in a data file select
Data. Click at the right-hand end of the
Look in box, switch to the correct location, find your file, click on the filename, then click
You can also open saved output files into the
Output Viewer window: Select
SPSS can have several output windows open at a time. If you already have an
Output Viewer active, & you open another output file, the latter will be opened into a second output window with a different name.
You probably noticed that most
SPSS dialogue boxes have a
Help button. This provides context-sensitive help. This is the simplest way to get help, but you should also become familiar with it more generally.
SPSS running, click on
Help in the main menu bar. The pull-down menu contains various options (e.g.,
Statistics Coach, which will guide you through the correct statistics to use for your data type(s), &
Tutorial on SPSS which you might find useful some time.
For now, select
Topics. This contains a
Contents option (contents),
Index (alphabetical list), &
Search (search for specific words / phrases).
Contents. This opens a
Contents panel, with a list of items. You can double-click on any of these to display further information about that topic. The
Forward arrows in the menu bar move you through pages. Spend a little time familiarising yourself with the help system. When you have finished, close the help window.
There is a simple online help site created by the University of Birmingham (click on The How To Guides, choose SPSS 10.0 (an early version, but the basic procedures are similar to later versions). From there on it should be easy. Try the Test Spotting Quizzes too.
There are very many books on
SPSS. Most of them would not be useful for you, many go into great detail about how to use
SPSS, but give little guidance on choosing statistical methods, or else give guidance which is incomplete or misleading, or else only cover elementary methods. The following are recommended, however.
SPSS, Sage, 3rd edition
SPSSinformormation on the book's website
SPSSfor beginners. There are many examples and instructions in later chapters.
SPSS15 made simple. Psychology Press.
SPSSto important statistical concepts. It includes a wide range of
SPSSprocedures, and some advanced statistical methods; but the coverage of the more advanced methods tends to be incomplete.
SPSSfor Windows & Macintosh, Prentice-Hall (Pearson) 5th edition