Teaching home - Stats - Department - University - Resources - Contact - NPHolmes' lab - NPHolmes' personal
Teaching home - Course #PYM0S1-2

SPSS #1: Data Entry & Descriptive Statistics

SPSS is a widely used statistical package in social science. Many of you will have used it before. This worksheet gives the basics for those who have not, using version 15 of SPSS (the computers will regularly be updated, but the commands should mostly be the same...).

Contents

  1. Starting SPSS in room G37
  2. SPSS Windows, menu & toolbar
  3. Data set: Workers Survey
  4. Entering data into SPSS
  5. Adding variable labels & value labels
  6. Defining missing values
  7. Other variable characteristics
  8. Saving data
  9. Descriptive statistics. Frequencies: Discrete variables & bar charts
  10. Output & data windows
  11. Frequencies for true numeric variables: Summary statistics & histograms
  12. Compare means: Means for subgroups
  13. Scatterplots & testing correlations
  14. Editing & saving the output, leaving SPSS
  15. Restarting SPSS, retrieving previous data & output files
  16. The SPSS Help system, online help, & books

  1. Starting SPSS in Room G37
  2. SPSS has different-numbered versions. For this course, we will use whatever version is installed!. The computers in G37 have at least one version & may have some older versions as well. All the versions work similarly, & you should easily be able to use an earlier version once you know the latest one. Be aware that the different versions all use the same kind of data file (& syntax file, which you'll learn about in week 3) but they cannot read each others' output files. This is not a problem because you can transfer your work between different versions using data & syntax files.

    To open SPSS,, log on to Windows, click the big round button Start | All Programs | SPSS Inc | PASW Statistics XX | PASW Statistics XX.

    You'll see a starting box What would you like to do? with various options. You will be entering new data, not reading from a file, so select Type in data, click OK & the SPSS Data Editor screen opens.

    If you have never used SPSS (or have, but want to refresh your memory) it is worth looking over the elements of the SPSS screen & menus before using it. If you are experienced with SPSS, go straight to section 3.

  3. SPSS windows, menus, & toolbar
  4. When you start SPSS, the first window that is opened for you is the Data Editor: A spreadsheet-like window for defining & entering data. This also includes the Application Window, which has a menu bar & tool bar at the top, underneath the title bar, & a status bar along the bottom. Other windows may be opened for you as appropriate during processing, e.g., an Output window, Syntax, or Chart windows.

    SPSS is menu driven; that is, most of the features can be accessed by selections from the menus. The main menu bar contains ten items, as follows:

    Look at each menu in turn. Note which items appear under each menu. The items listed in bold are those available or appropriate at this point.

    The SPSS for Windows Tool Bar

    The tool bar underneath the menu bar allows you to access many of the most commonly used features. To find out what each of the options along the tool bar does, place the cursor over the tool for a moment & look at the pop-up box, where a short description will be displayed.

    More help with SPSS

    SPSS is an extensive system & you will often want to find help or information that is not in my handouts. The last section of this handout gives guidance on using online help, & recommends some books.

  5. Data set: Workers Survey
  6. The table below shows part of the results of a survey about workers. Each row (case) contains information (seven variables) about a single person. The variables & their values are defined below.

    Workers survey
    subjethnicgpsexyrssatis1satis2daysabs
    1111148
    2215229
    3315117
    43115224
    52236430
    60131332
    7112031
    84135443
    91214410
    10111045999
    11227334
    12229353
    Type??????
    ??????

    Which type of variables are these?

    • Interval/Ratio
    • Ordinal
    • Categorical/Nominal

    And, are these variables?

    • Discrete
    • Continuous

    Reveal the answers!

  7. Entering data into SPSS
  8. In SPSS, you define variables & enter data in the Data Editor. The Data Editor is a window that looks like a spreadsheet, with an array of cells in which you enter data. Unlike a spreadsheet, you can only enter data, not formulae.

    The empty Data Editor is illustrated below.

    The Data Editor has two views, Data View & Variable View (see tabs at bottom left of screen). The illustration below shows the Data View.

    This empty window is called Untitled1 [Dataset0]. Versions 14 onwards of SPSS allow you to have more than one Data Editor window open (Dataset0, Dataset1, etc).

    In the Data View, rows are cases (a participant or other unit of observation ), & columns are variables. The variables are unnamed by default, labelled simply var. The Workers Survey data have seven variables, so you will use seven columns, naming them after each variable subj to daysabs.

    The Variable View of the Data Editor is where you define the characteristics of your variables. It is possible to enter data without naming variables, but it is better, & avoids errors, if you start defining the variables first. To do so, click the Variable View tab at bottom left, & you see this:

    In the Variable View, each row corresponds to a variable. In each row you can define the name, type, & other characteristics of the variable.

    Type the names of the first two variables from the data set, subj & ethnicgp, in the first two rows under Name. Variable names can be in lowercase or capitals (lowercase is preferable). We have used variable names that are short, no more than eight characters. After each variable name, press Enter. The screen will now look as follows:

    SPSS inserts the default variable characteristics each time you type in a new variable name. By default, it assumes variables will be of the type numeric, have a maximum Width of eight digits including decimals, & two decimal places. Other parameters are undefined by default.

    For the moment, return to Data View, & you will see the two new variable names at the top of the first two columns.

    Type (or Copy | Paste) in the subj & ethnicgp data from the 12 subjects, copying from the table above. To correct any errors, select the cell & re-type. SPSS automatically gives the data two decimal places, & allows up to eight digits, as specified in the variable characteristics.

    You can leave the Width & Decimal Places as they are, but it's neater to change them. Return to Variable View & select the Type cell for either of the variables (in the illustration above it's selected for ethnicgp). A small button appears at the right-hand end of the cell as shown. Click this & you see a Variable Type dialogue box:

    Change the number of Decimal Places to zero. Change the Width to the maximum number of digits needed for values ethnicgp. Click OK. Repeat the process for the variable subj.

    You see that this box allows you to do other things, e.g., change variables from numeric to other types (note that numeric here does not imply that the data are interval data...). We recommend that you generally make your variables numeric, because SPSS statistical procedures don't always work well with other types.

    Return to Data View to see the effect of the changes.

    Before entering more data, go back to Variable View, type in the names of the remaining variables (sex, etc.), & define the Width & Decimal Places. Instead of the Type, you can click on Width & Decimal & change the numbers there. (For any continuous variables, leave Decimals set to two rather than changing it to zero.).

  9. Adding variable labels & value labels
  10. SPSS allows you to label your variables, which often makes the output of analyses easier to understand. There are two types: Variable labels, & Value labels.

    A Variable label is simply a more detailed description of what that variable is (e.g., for the variable yrs you might wish to add the variable label Number of years working for this company. (Alternatively, you can use longer variable names, but these can become difficult to work with).

    Value labels are often helpful for a variable which has a small number of values (a discrete variable); for example ethnicgp, whose values are 1, 2, 3, & 4. By adding value labels, you record that value "1" has label "White", value "2" has label "Asian", etc.

    Go to Variable View. If you want to provide a Variable label for yrs, click in the Label column opposite this variable, type in the description & press Enter.

    Value labels, as noted above, are often used for discrete variables, especially where it is not obvious what the values stand for. These may be categorical or ordinal variables.

    To define value labels for ethnicgp, click in the Values cell of that variable, & a button appears (as shown below). Click the button to obtain the Value Labels dialogue box.

    To add the first Value Label:

    1. Click in the Value box
    2. Type 1
    3. Click in the Value Label text box, type "White"
    4. Click on the Add button

    And add the rest of the value labels...

    Note: Value 0 indicates a subject whose ethnic group is not known. Such subjects must be omitted from any analyses that use the variable ethnicgp. In the next section you will define 0 as a Missing Value for ethnicgp. So it is not essential to provide a label for value 0, although it may be helpful. Occasionally, it is important to label missing values, e.g., when there is more than one type of missing value.

    When you have labelled all the values you want to, click OK.

    Repeat the labelling procedure for any other variables in the data set that you wish. They do not all require Value Labels.

    Shortcut: The 2 satis variables have identical value labels. To save unnecessary typing, you can define the Value Labels for one of them, & Copy these for the other one. Define the labels for satis1, then select its Labels cell, click Edit | Copy on the menu, select the Labels cell for satis2, & click Edit | Paste. The same labels will be inserted.

    After you have defined Value Labels, you can make SPSS display either the labels or the numeric values in the Data Editor. To control this (& to toggle on/off), click on View | Value Labels.

  11. Defining missing values
  12. It is essential to tell SPSS what values of a variable correspond to missing data, e.g., where data were lost, or a participant refused to reply. This will ensure that SPSS omits such cases from any analyses that use that variable. If it tried to include them, the analysis would not make sense.

    SPSS has a standard missing value code (.) for numeric data values that are missing: This is known as a System Missing value. However, it is often helpful to specify user missing values, as follows.

    Looking at the variable descriptions in Section 3, you see that, for many variables (e.g., ethnicgp, 0 signifies a missing value: Not Known), but, for one variable (daysabs), a special value 999 signifies missing values. A value of 0 would not be appropriate because it's a possible real value: A person could be absent from work for 0 days in a year. To convey this information to SPSS: In Variable View, click on the Missing cell for the variable & click the button to produce the Missing Values dialogue box.

    This provides options for defining missing values. By default, SPSS treats all values as valid (not missing) so the initial setting is No missing values. You can declare up to three separate, discrete values, & specify these as missing, or a range of missing values. SPSS allows up to three missing values because, e.g., in survey research there might be different reasons for missing values (e.g., refused, not applicable, don't know), & you might want to distinguish these. All of these responses can be given different numeric codes, & Value Labels, & then defined as missing values, allowing these responses to be treated differently in the subsequent analysis.

    In the Workers Survey data set, some variables have 0 as the missing-value or a No Response code, & daysabs has 999 for missing values.

    To specify this as a missing value to SPSS:

    1. Select the Missing cell for variable daysabs & click the button
    2. Click the Discrete missing values radio button
    3. Type 999 in the first discrete missing value box
    4. Click OK

    Specify 0 as a missing value for ethnicgp, using the above procedure.

    Some other variables have 0 specified as a missing value, so insert these as well. Be careful not to use 0 as a missing value if it is a possible real value for that variable

    It is good practice to define suitable missing values for all variables. (There may not be missing data among these 12 subjects, but you might add other subjects later who do have missing data.) So follow the equivalent steps for all the variables in the data set.

    Displaying Variable Information. To check the characteristics of any of the variables on your file, look at the Variable View screen, or go to Data View, select Utilities | Variables. This displays a dialogue box of all the variables you have specified. To choose one of the variables, click on the variable name on the left hand side of the dialogue box, so that it is highlighted. Information about that variable is then displayed on the right hand side of the dialogue box.

  13. Other variable characteristics
  14. If you move or scroll over to the right side of the Variable View screen, you find three more characteristics not yet discussed: Columns, Align, & Measure.

    Columns is optional (it sets the maximal width of the column, e.g. if you need more space to display the variable name). Align can be left as it is.

    Measure should be looked at. There are three options: You can specify the level of measurement of the variable as scale (i.e., interval- or ratio-scale numeric data), ordinal, or nominal. Click on each Measure cell, & enter a choice. Insert the appropriate values for each variable in this column. If you don't enter these, SPSS will "guess" which levels of measurement your variables have, and will not necessarily guess correctly. It is useful to record the correct information about your variable, because the SPSS help uses this terminology in explanations, & the graph-drawing procedures expect it.

  15. Saving data
  16. Before continuing, Save your data file to your drive:

  17. Descriptive statistics. Frequencies: Discrete variables & bar charts
  18. The Menu Bar option Analyze offers a wide range of procedures. We will first look at the Frequencies command, which comes under Analyze | Descriptive Statistics

    Frequencies is a useful general-purpose procedure, which produces selected descriptive statistics (central tendency, dispersion, skewness, & kurtosis) & charts (e.g., bar charts) for individual variables. You must decide what are the suitable descriptives or charts for the variable chosen. In this section, we illustrate its use for discrete (especially categorical) variables.

    Select Analyze | Descriptive Statistics | Frequencies

    On the left side of the dialogue box is a list of all your variables. In this example, the variables are listed in the same order as in the Data Editor. They may appear differently on different machines (e.g., in alphabetical order). In the list, highlight sex & ethnicgp. To highlight more than one item, click the first, then hold down the Ctrl key while clicking others.

    Why are sex and ethnicgp the most suitable variables for this procedure?

    Click on the right-pointing arrow , which transfers the highlighted variables to the list on the right. Now click on the Charts button and select Bar Chart. Click Continue, which returns you to the Frequencies dialogue box.

    Click the Statistics button which allows you to display various summary statistics. Some of these (e.g., medians, means) only make sense for truly numeric (ordinal or interval) variables, & not for categorical variables such as sex. Others (e.g., quartiles) are applicable to distributions of any variable, but they are of little use with this very small sample. So leave all statistics un-selected, & click Continue. To run the Frequencies procedure, click OK.

  19. Output & data windows
  20. The output of the procedure, including the chart, will be written to the Output1 - SPSS viewer window.

    The Output (or Viewer) window will simply append output from any process you run. The window is divided into two panes: the left pane contains an outline view of the output contents (rather similar to Windows Explorer), while the right pane contains any statistical output, charts, or tables you generated during your SPSS session. You can scroll up and down the Output window, expand or move it, or edit it, using the Edit menu functions. Or you can click on the outline headers in the left pane to jump straight to a specified section of output. Pressing Delete will delete the section currently highlighted in the outline.

    Look at the output from the Frequencies procedure. One subject has a missing value for variable ethnicgp. How does the output table show that there is a missing-value case?

    Can you tell from the bar chart that there is a missing-value case? How can you tell?

  21. Frequencies for true numeric variables: Summary statistics & histograms
  22. Run the Frequencies command again, but this time choose a variable for which it makes sense to calculate a mean or median: A truly numeric variable. It can be ordinal or interval-level, but preferably a continuous variable, to provide a clearer contrast with the previous example.

    Remove sex & ethnicgp from the right-hand window by highlighting them & clicking the left-pointing arrow. Then move a numeric variable from the left to the right window.

    Next, click Statistics & choose an appropriate measure of central tendency for the variable:

    Mean or median?

    Look at the other possibilities in this dialogue box, & select some others that might be relevant. Click Continue, then Charts. This time, choose a Histogram. It is also useful to select With normal curve. Click Continue, then OK to run the procedure, and compare this output with the preceding one.

    How does a histogram differ from a bar chart?

    What does the normal curve show?

  23. Compare means: Means (etc.) for subgroups
  24. The previous Frequencies procedure calculated the mean over all subjects. But often you want to obtain the means, or other summary statistics, separately for subgroups of subjects (e.g., males vs. females). Subgroups can be defined by a categorical, or any other discrete, variable. In this case it would be sex. Compare means allows you to do this.

    From the Menu Bar, select Analyze | Compare Means | Means.

    Make sex the Independent variable (i.e., the one which defines the subgroups). In the Dependent list, put all suitable variables (i.e., all the variables for which it makes sense to calculate a mean or median value). One variable is definitely not suitable: Which?

    Click Options which shows that, by default, the mean, number of cases & standard deviation will be computed for each variable. You can select other statistics by moving them from the left- to the right-hand list. Select the Median (because some of the variables may be ordinal), & any others you want. Click Continue, then OK.

    Note: if you defined Value Labels for the two values of sex, these labels appear in the output, making it much easier to understand.

    Now compute the median value of any one (or more) of the suitable variables separately for each ethnicgp, & inspect the results.

    What happens when a person's ethnic group is not known (missing)?

    Note: in real life, & in future classes, you may go on to compute statistical tests of whether, for example, males & females have significantly different means on some numeric variable. It is highly advisable, before running such tests, to compute descriptive statistics for each of the groups, in the way illustrated here. By doing this you can ensure that all relevant cases have been included, missing values have been correctly ignored, & you can see how the means & medians differ from each other, which may not always be obvious from the statistical test output.

  25. Scatterplots & testing correlations
  26. Finally, choose two variables for which you might wish to display a scatterplot, so that you can view any relationship between them &, if appropriate, test for a correlation.

    In this week's class we discussed what types of variable are suitable for correlation testing. Correlations test for monotonic relationships (where the direction of change between levels of the variables is constant). If there is a relationship but it's non-monotonic, a correlation test may not be appropriate.

    Choose a pair of variables for which it is appropriate to draw a scatterplot and, perhaps, test a correlation (hint: only four of the variables in the dataset are suitable).

    If testing a correlation between these two variables, is there any reason to prefer a parametric or nonparametric test? (Hint: What is the scale of measurement of each of the variables?)

    You can use the Graphs option on the menu to produce many types of chart, including scatterplots. For this to work optimally, the Measure column for each of your variables should be defined as Nominal, Ordinal, or Scale.

    To draw a simple scatterplot of your two variables, select Graphs | Chart Builder | ScatterPlot (alternatively, use the Legacy dialogues)

    The next dialogue box offers many options. The only essential thing is to choose which variables are plotted on the Y (vertical) & X (horizontal) axes, so drag your two variables into the appropriate spaces. There are many other options (e.g., to add a title or vary the legend symbols). Ignore these for the moment. Click OK to display the plot.

    Does the plot indicate any monotonic relationship between your variables? If so, is the apparent correlation positive or negative?

    If you decide to try a different pair of variables before testing the correlation, do so.

    To test the correlation, select Analyze | Correlate | Bivariate

    Insert the two variables for which you drew the scatterplot, into the Variables box. Correlations can be positive or negative, so you can test a directional hypothesis. Thus, you can choose between two-tailed & one-tailed significance tests. Two-tailed is more usual; a one-tailed test is only relevant if you have a directional hypothesis, & if the correlation is in the specified direction (next week's class will discuss one-tailed tests). For this example, select two-tailed.

    You can choose any or all of three tests: Pearson's r (parametric), Kendall's tau-b, Spearman's rho (both nonparametric). For this example, select Pearson's & Spearman's. The dialogue box offers other possibilities, try them if you like. Click OK and look at the output.

    There are two output tables, Pearson's (called Correlations) & Spearman's (Nonparametric Correlations). Each cell of the table shows the correlation coefficient, & below it Sig (2-tailed) (i.e., the 2-tailed p-value), & N (sample size after omitting missing values).

    Why are some of the entries in the table 1.00, with no Sig value?

    What is the Pearson correlation between your two variables, its N & p?

    What are the equivalent Spearman correlation statistics?

    Is the direction of correlation (positive or negative) as expected from the scatterplot?

    Is either p-value significant (i.e., p<=.05)?

    How closely do the Pearson and Spearman results agree?

    If you are familiar with 1-tailed tests, you may know that (provided the test is appropriate) it is "easier" to obtain significance with a 1-tailed than a 2-tailed test.

    To see how this is apparent in the SPSS output, run the same procedure again, but select 1-tailed instead of 2-tailed.

    The correlation coefficients and Ns should be the same, but the p-values are labelled Sig (1-tailed) & are different from before.

    How can you tell that the 1-tailed tests are "closer to significance" than the 2-tailed ones?

    You may have noticed that the 1-tailed p-values are approximately half the 2-tailed ones. This is no accident. It should become clear why when we discuss 1-tailed tests in next week's class.

    You can use the Correlate procedure to make a table or matrix of correlations between all suitable variables, as follows.

    Run the procedure again. Insert all 4 suitable variables into the box. Select only Pearson & 2-tailed.

    Inspect the matrix. Note any correlations that are significant at p≤.05, 2-tailed, & note the relevant details (i.e., the variables, the correlation statistic (including the + or -), N & the p value.

    Because SPSS shows exact p-values, we can see which effects "just miss significance" (i.e., have a p-value which is only just greater than .05), & which effects are a long way from significance (much larger p-values). As we will discuss in next week's class, it is often useful to know which effects are marginally significant (i.e., have a p-value which is >.05 but ≤.10).

    One correlation is marginally significant. Which?

    All other correlations in the table have ps>.10 (i.e., they are a long way from 2-tailed significance).

  27. Editing & saving output, leaving SPSS
  28. You can save the whole output as a file for later use, & you can print them.

    Output files are quite large, especially if they contain charts, so, before saving or printing, check through to decide whether you need it all. To remove a section, click on the section of output, or on its label in the left-pane Outline. Then either press Delete, or use Edit | Cut, which allows you to restore it (with Edit | Paste) if you change your mind.

    Save your output to disk. SPSS automatically gives the filename an extension of .spo or .spv, identifying it as an SPSS Output Viewer file. In Windows Explorer, you may not see the extensions .sav & .spo, but they will have different icons, & may be labelled SPSS Data Document & SPSS Viewer Document respectively.

    Note: The output file is in a format specific to SPSS, & the file can only be read into SPSS (& only to the version of SPSS that created the output. It cannot read directly into a word-processing package. To read a previously saved output back into the Output Viewer window of SPSS, select File | Open | Output.

    You can copy & paste sections of output, including charts, from SPSS into word-processing documents. There's more on this later in the course.

  29. Restarting SPSS, retrieving previous data & outputs
  30. If you restart SPSS, the opening dialogue gives you the choice of typing in new data, or opening an existing data file (Open an existing data source, plus a list of recently used data files. Find your file...).

    To read in a data file select File | Open | Data. Click at the right-hand end of the Look in box, switch to the correct location, find your file, click on the filename, then click OK.

    You can also open saved output files into the Output Viewer window: Select File | Open | Output. SPSS can have several output windows open at a time. If you already have an Output Viewer active, & you open another output file, the latter will be opened into a second output window with a different name.

  31. The SPSS help system, online help, & books
  32. SPSS help system

    You probably noticed that most SPSS dialogue boxes have a Help button. This provides context-sensitive help. This is the simplest way to get help, but you should also become familiar with it more generally.

    With SPSS running, click on Help in the main menu bar. The pull-down menu contains various options (e.g., Statistics Coach, which will guide you through the correct statistics to use for your data type(s), & Tutorial on SPSS which you might find useful some time.

    For now, select Topics. This contains a Contents option (contents), Index (alphabetical list), & Search (search for specific words / phrases).

    Select Contents. This opens a Contents panel, with a list of items. You can double-click on any of these to display further information about that topic. The Back & Forward arrows in the menu bar move you through pages. Spend a little time familiarising yourself with the help system. When you have finished, close the help window.

    Other online help with SPSS

    There is a simple online help site created by the University of Birmingham (click on The How To Guides, choose SPSS 10.0 (an early version, but the basic procedures are similar to later versions). From there on it should be easy. Try the Test Spotting Quizzes too.

    Books on SPSS

    There are very many books on SPSS. Most of them would not be useful for you, many go into great detail about how to use SPSS, but give little guidance on choosing statistical methods, or else give guidance which is incomplete or misleading, or else only cover elementary methods. The following are recommended, however.


Teaching home - Stats