Creating an "ID" variable in SPSS Statistics

Introduction

An ID variable, also know as a case identifier or subject/participant ID, is often used in SPSS Statistics to distinguish between "cases" in your data set. Cases may be people (e.g., participants in your study), objects, animals or something else that you have measured.

In this guide, we show you how to create an ID variable using the Compute Variable... procedure in SPSS Statistics. First, we set out the example used in this guide.

Note: The data setup and procedure to create an ID variable are identical for all versions of SPSS Statistics. However, in version 27, SPSS Statistics introduced a new look to their interface called "SPSS Light", replacing the previous look for versions 26 and earlier versions of SPSS Statistics, which was called "SPSS Standard". Therefore, if you are using SPSS Statistics version 26 or an earlier version of SPSS Statistics, the images in this guide will be blue rather than light grey. However, the data setup and procedure are identical.

SPSS Statistics

Data setup in SPSS Statistics

We want to carry out a multiple regression to predict maximal aerobic capacity (VO₂max), which is an indicator of fitness and health, based on a person's age, weight, heart rate and gender. In this study, the dependent variable is "VO₂max" and the four independent variables are "age", "weight", "heart rate" and "gender". Data was collected from 100 participants that took part in the research.

Explanation: An ID variable is not used directly in calculations for a multiple regression analysis. However, when testing the assumptions of a multiple regression, such as detecting outliers in the data, it becomes easier to identify such outliers when an ID variable has been created.

To set up these variables, SPSS Statistics has a Variable View where you define the types of variables you are analysing and a Data View where you enter your data for these variables. The data setup in the Variable View is shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

Each of our five variables is displayed on a separate row (i.e., the dependent variable, VO2max, on row , and the four independent variables – age, weight, heart_rate and gender – on rows , , and , respectively).

After you use the Compute Variable... procedure in SPSS Statistics later in this guide to create an ID variable, ID, this sixth variable will appear in our Variable View, as highlighted below:

Variable View showing five variables. ID variable highlighted

Published with written permission from SPSS Statistics, IBM Corporation.

On row , you can see that the ID variable, ID, is a nominal variable (i.e., in the cell under the column) with no decimal places (i.e., "0" in the cell under the ). We have also given it the label, "Participant ID" (i.e., in the cell under the ) so that it is clear what the ID variable represents (i.e., the ID variable has been set up in our example to represent each of the 100 participants who took part in the research, but if your ID variable represents something different – perhaps an object, animal, a series of variables, etc. – you may want to label your ID variable accordingly). Finally, the cell under the is set to .

Note: We suggest changing the cell under the column from to , but you do not have to make this change. We suggest that you do because there are certain analyses in SPSS Statistics where the setting results in your variables being automatically transferred into certain fields in the dialogue boxes you may use to carry out your analysis. Since you may not want to transfer these variable, we suggest changing the setting to so that this does not happen automatically.

Based on the file setup for the five variables in the Variable View above (i.e., without the ID variable), the Data View window will look as follows:

Published with written permission from SPSS Statistics, IBM Corporation.

Our five variables are displayed in the columns of the Data View based on the order we entered them into the Variable View window. Therefore, in our example, we first entered the dependent variable, VO2max, so this appears in the first column, entitled , followed by the four independent variables from left to right: age in the second column, entitled , and then weight, heart_rate and gender in the , and columns respectively.

Note: The setup above is known as wide format because there is one case per row. This means that each row contains the data for a single case; that is, each row contains the scores for a single participant for each of the variables that have been measured. If your data is set up differently (i.e., in long format), the SPSS Statistics procedure shown in this guide will not help you to create an ID variable. If this is the case, please contact us.

SPSS Statistics procedure to create an "ID" variable

In this section, we explain how to create an ID variable, ID, using the Compute Variable... procedure in SPSS Statistics. The following procedure will only work when you have set up your data in wide format where you have one case per row (i.e., your Data View has the same setup as our example, as explained in the note above):

Click Transform > Compute Variable... on the main menu, as shown below:

Note: Depending on your version of SPSS Statistics, you may not have the same options under the Transform menu as shown below, but all versions of SPSS Statistics include the same option that you will use to create an ID variable.

computer menu to create a new ID variable

Published with written permission from SPSS Statistics, IBM Corporation.

You will be presented with the Compute Variable dialogue box, as shown below:

'recode into different variables' dialogue box displayed

Published with written permission from SPSS Statistics, IBM Corporation.

Enter the name of the ID variable you want to create into the Target Variable: box. In our example, we have called this new variable, "ID", as shown below:

ID variable entered into Target Variable box in top left

Published with written permission from SPSS Statistics, IBM Corporation.

Click on the

button and you will be presented with the Compute Variable: Type and Label dialogue box, as shown below:

empty 'compute variable: type and label' dialogue box

Published with written permission from SPSS Statistics, IBM Corporation.

Enter a more descriptive label for your ID variable into the Label: box in the –Label– area (e.g., "Participant ID"), as shown below:

participant ID entered in 'compute variable: type and label' dialogue box

Published with written permission from SPSS Statistics, IBM Corporation.

Note: You do not have to enter a label for your new ID variable, but we prefer to make sure we know what a variable is measuring (e.g., this is especially useful if working with larger data sets with lots of variables). Therefore, we entered the label, "Participant ID", into the Label: box. This will be the label entered in the column in the Variable View of SPSS Statistics when you complete at the steps below.

Click on the

button. You will be returned to the Compute Variable dialogue box, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

Enter the numeric expression, $CASENUM, into the Numeric Expression: box, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

Explanation: The numeric expression, $CASENUM, instructs SPSS Statistics to add a sequential number to each row of the Data View. Therefore, the sequential numbers start at "1" in row , then "2" in row , "3" in row , and so forth. The sequential numbers are added to each row of data in the Data View. Therefore, since we have 100 participants in our example, the sequential numbers go from "1" in row through to "100" in row .

Note: Instead of typing in $CASENUM, you can click on "All" in the Function group: box, followed by "$Casenum" from the options that then appear in the Functions and Special Variables: box. Finally, click on the button. The numeric expression, $CASENUM, will appear in the Numeric Expression: box.

Click on the

button and the new ID variable, ID, will have been added to our data set, as highlighted in the Data View window below:

data view with new 'nominal' ID variable highlighted

Published with written permission from SPSS Statistics, IBM Corporation.

If you look under the

column in the Data View above, you can see that a sequential number has been added to each row, starting with "1" in row

, then "2" in row

, "3" in row

, and so forth. Since we have 100 participants in our example, the sequential numbers go from "1" in row

through to "100" in row

Therefore, participant 1 along row

had a VO₂max of 55.79 ml/min/kg (i.e., in the cell under the

column), was 27 years old (i.e., in the cell under the

column), weighed 70.47 kg (i.e., in the cell under the

column), had an average heart rate of 150 (i.e., in the cell under the

column) and was male (i.e., in the cell under the

column).

The new variable, ID, will also now appear in the Variable View of SPSS Statistics, as highlighted below:

variable view for new 'nominal' ID variable highlighted

Published with written permission from SPSS Statistics, IBM Corporation.

The name of the new variable, "ID" (i.e., under the

column), reflects the name you entered into the Target Variable: box of the Compute Variable dialogue box in Step 2 above. Similarly, the label of the new variable, "Participant ID" (i.e., under the

column), reflects the label you entered into the Label: box in the –Label– area in Step 4 above. You may also notice that we have made changes to the

and

columns for our new variable, "ID". When the new variable is created, by default in SPSS Statistics the

column will be set to "2" (i.e., two decimal places), the

will show

and the

column will show

. We changed the number of decimal places in the

column from "2" to "0" because when you are creating an ID variable, this does not require any decimal places. Next, we changed the variable type from the default entered by SPSS Statistics,

, to

, because our new ID variable is a nominal variable (i.e., a

variable) and not a continuous variable (i.e., not a

variable). Finally, we changed the cell under the

from the default,

, to

, for the same reasons mentioned in the note above.

Creating an "ID" variable in SPSS Statistics

Introduction

SPSS Statistics

Data setup in SPSS Statistics

SPSS Statistics

SPSS Statistics procedure to create an "ID" variable

Referencing